Today I learned: command > script

Had to compare two files at work today. Actually, I had to compare one file to a series of files to see what data exists in both of them. This technically comes down to a LEFT JOIN where we only want left column data when it exists in the right column.

So, in writing a script in PHP it comes down to:

ini_set('MEMORY_LIMIT', '256M');
if (!file_exists($argv[1])) { die('file ' . $argv[1] . ' not found'); }
if (!file_exists($argv[2])) { die('file ' . $argv[2] . ' not found'); }
$fp = fopen($argv[1], 'rt');
$lines = [];
do {
  $line = trim(fgets($fp));
  if (strlen($line) > 0) {
    $lines[] = $line;
} while (!feof($fp));
$fp = fopen($argv[2], 'rt');
do {
  $line = trim(fgets($fp));
  if (strlen($line) > 0) {
    if (in_array($line, $lines)) {
      echo "$line\n";
} while (!feof($fp));

This script, albeit working like a charm, takes a while with large amounts of records.

After some googling this script isn’t really necessary if you use grep correctly. You also gain the speed of an executable in one fell swoop.

$ grep -Fxf [file1] [file2]

Output is exactly the same.

Today I learned: regex > loop

In writing “quad-quad”, which is a set of four 4-letter speak-able words that can be used as a user-friendly “bookmark” into easily finding a record, I was writing a “quick” program to extract the contents of wikidatawiki-20220820-pages-articles-multistream.xml (a wikipedia dump) and came into this large delay in the following loop:

$alphas = 'qwertyuiopasdfghjklzxcvbnm ';
$newline = '';
for ($x = 0; $x < strlen($line); $x++) {
    $c = substr($line, $x, 1);
    if (strpos($alphas, $c) !== false) {
        $newline = $newline . $c;
    else {
        $newline = $newline . ' ';

The loops main purpose is to sanitize any non-letter data by replacing unknown characters with a space for later processing. The end result would be words that I could filter down to 4-character words and tally them up.

When the program read a line around 1mb in length it would “hang” for a bit as it chewed through the data. In a nutshell 25,100,655 bytes of data would take 24m36s. It was time to optimize.

Replacing the previous with the following regex performance was increased immensely.

$newline = preg_replace('/[^a-z]/', ' ', $line);

The same amount of data took 1.892s.

Lesson: If you don’t know regexes, learn regexes.

Jack and Coke? How about John and CUDA (w/ Rocky 8.7 Live)

In previous writeups such as xmrig with cuda for Rocky Linux 8.5 and nVidia CUDA with the wrong video card I’ve navigated Rocky Linux and Cuda. It’s now time to see if we can get John the Ripper CUDA’s components running on a Rocky 8.7 Live Workstation USB install.

Personally, I love projects like this. I started this on 1 8GB USB stick and quickly realized that not only the space required wasn’t enough but I’d need more to do what I needed. I ended up getting 3 SanDisk 32GB Ultra USB 3.0 Flash Drives from Amazon for $16.96.

The biggest help with the Live USB install is using balenaEtcher to get the 2.1GB ISO to an 32GB USB stick. Once that’s done we can boot directly to the Live OS and start our installs.

I did have some derps with balenaEtcher failing to burn the ISO due to a failure of diskpart not returning a positive result to the clean operation. To resolve this I had to use PowerISO to clean the USB volume before windows would properly do it’s clean operation. Minor note to PowerISO is that it contains bloatware during the install and a wrongly-clicked click can give you headaches.

Live Stuff

  • Booted up Rocky 8.7 Workstation Live Workstation from a USB to install Rocky 8.7 Workstation on a separate USB stick.
  • Root with password, user with password
  • rebooted into USB bootable
  • #win

Now onto the necessities to get to our final goal

Read More