Today I learned: regex > loop

In writing “quad-quad”, which is a set of four 4-letter speak-able words that can be used as a user-friendly “bookmark” into easily finding a record, I was writing a “quick” program to extract the contents of wikidatawiki-20220820-pages-articles-multistream.xml (a wikipedia dump) and came into this large delay in the following loop:

$alphas = 'qwertyuiopasdfghjklzxcvbnm ';
$newline = '';
for ($x = 0; $x < strlen($line); $x++) {
    $c = substr($line, $x, 1);
    if (strpos($alphas, $c) !== false) {
        $newline = $newline . $c;
    else {
        $newline = $newline . ' ';
    }
}

The loops main purpose is to sanitize any non-letter data by replacing unknown characters with a space for later processing. The end result would be words that I could filter down to 4-character words and tally them up.

When the program read a line around 1mb in length it would “hang” for a bit as it chewed through the data. In a nutshell 25,100,655 bytes of data would take 24m36s. It was time to optimize.

Replacing the previous with the following regex performance was increased immensely.

$newline = preg_replace('/[^a-z]/', ' ', $line);

The same amount of data took 1.892s.

Lesson: If you don’t know regexes, learn regexes.

Liars Dice

Thanks to the Pirates of the Caribbean I’ve been introduced to this game that uses dice and a wager component, with the movie using “years of service” as it was the only thing the pirates aboard the Flying Dutchmen had as currency. As a young 20-something I used to Play 7’s and 11’s, where if you rolled 7, 11, or a pair you’d get to tell someone to drink. A simple game that I don’t want to pass to my youth just yet, and I feel a simple gambling game is better to pass along.

Liars dice is easy:

Read More

Flyleaf – Fully Alive / Down the Rabbit Hole

Oddly in my list of Youtube videos that are recommended to me on a daily basis happened to be an old song from a band called Flyleaf called “Fully Alive”. This was one of many songs from my youth that I had forgotten about and was overly ambitious to listen to the second I had seen it.

After enjoying the 2.5 minutes of high-pitched vocals and hard rock I had a brief epiphany: “Fly + Leaf = Flyleaf” How difficult would it be to attach an insect and a plant together and create a new band name? Whipping out the programmings I had found 3 solid references and let the computers do their workings and posted the results up on github, and further hosted it here for maximum clickability.

Alas, my curiosity of this song and the immediate conception of the “band name” program did not stop there. The lyrics needed a bit of attention to as they seemed unusually specific:

Read More