Ryan Finnie

The Repository Run-Parts CI Directory (RRPCID) specification

Years ago, I wrote dsari (“Do Something And Record It”), a lightweight CI system. This was prompted by administering Jenkins installations for multiple development groups at the time, each environment having increasingly specialized (and often incompatible) plugins layered onto the core functionality.

This led me to take the opposite approach. I made a CI system based on one executable (usually a script) per job, and the assumption that you, the CI job developer, know exactly what functionality you want. Want custom notifications? Write it into the script. Sub-job triggers based on the result of the run? You can totally do that. Remote agents? Bah, just tell the script to ssh to a remote system based on the concurrency group the run is currently in. dsari’s acronym was a light-hearted take on this simplicity.

Fast forward to now, and GitHub’s CI has quickly become ubiquitous. But before that, Travis essentially pushed the idea of in-repository CI definitions, as opposed to a CI job being built around the repository as in Jenkins. As an example, finnix-live-build has a GitHub workflow which makes a test build, but I also have multiple dsari instances at home, for different architectures, doing the same thing on schedule.

However, the dsari job script merely replicates the build process of the GitHub workflow. If I add new functionality to the GitHub workflow, I need to also update build scripts on 5 different machines. This would be great to move in-repository, but I quickly found there is no established general-purpose in-repository CI layout.

So let’s make one!

If the closest simplification to the Jenkins CI model is cron, the closest simplification to the GitHub CI model is run-parts. However, since run-parts has different functionality on different systems (with Debian’s implementation currently being the most versatile), the “Run-Parts” part of the RRPCID acronym is in spirit only (though you could use Debian’s run-parts --exit-on-error for the job processing part of the RRPCID logic).

Here’s the specification I came up with:

  • A workflow is a collection of jobs, and is a readable directory under .rrpcid/workflows/.
  • A job is a collection of actions, and is a readable directory under ${workflow_dir}/jobs/.
  • An action is an executable file under ${job_dir}/. In theory this can be anything, but is likely to be a shell script.
  • Actions are executed with the repository as the current working directory.
  • Actions are executed with the environment variable CI=true. Other environment variables may be passed in from the underlying CI manager.
  • Actions are executed in lexical sort order within the job directory.
  • Workflow, job and action names must only contain letters a through z and A through Z, numbers 0 through 9, and characters “-“ (dash) and “_” (underscore). Note specifically the lack of “.” (period).
  • A repository may have multiple workflows, a workflow may have multiple jobs, and a job may have multiple actions.
  • If an action exits with a status other than 0, further actions in a job are skipped.
  • All jobs in a workflow are run, regardless of whether other jobs’ actions have failed.
  • Actions within the job must not assume another job has previously run.
  • An implicit, unnamed workflow lives directly in .rrpcid/.
  • Whether all, some or none of the repository’s workflows are run is up to the CI manager; that logic is outside the scope of this specification.
  • All other files and directories are ignored. For example, a directory named .rrpcid/testdata/ is outside the scope of this specification, and would not be handled.
  • A recommended directory for generated artifacts is artifacts/ within the workflow directory, but a CI manager is not required to do anything with this.

A script layout utilizing multiple workflows (including the implicit unnamed workflow) and multiple jobs might look like this:

.rrpcid/jobs/ci/action_1
.rrpcid/jobs/ci/action_2
.rrpcid/jobs/lint/run-lint
.rrpcid/workflows/deploy/jobs/env1/deploy
.rrpcid/workflows/deploy/jobs/env2/deploy
.rrpcid/workflows/deploy/jobs/archive/01tar
.rrpcid/workflows/deploy/jobs/archive/02upload

The following shell code would satisfy the above requirements, assuming it’s being run from dash/bash (both sort “*” matches which are needed for the job directory; other Bourne shells may not). It satisfies the requirements, but is by no means the only way to implement an RRPCID processor.

export CI=true
run_workflow() {
    for job_dir in "${1}/jobs"/*; do
        [ -d "${job_dir}" ] || continue
        [ -x "${job_dir}" ] || continue
        [ -z "$(basename "${job_dir}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
        for action in "${job_dir}"/*; do
            [ -x "${action}" ] || continue
            [ -z "$(basename "${action}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
            "${action}" || break
        done
    done
}

run_workflow .rrpcid
for workflow_dir in .rrpcid/workflows/*; do
    [ -d "${workflow_dir}" ] || continue
    [ -x "${workflow_dir}" ] || continue
    [ -z "$(basename "${workflow_dir}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
    run_workflow "${workflow_dir}"
done

finnix-live-build now has a simple RRPCID job, though as of this writing I have not yet switched over the home dsari jobs to utilize it.


Side note / rant: I went back and forth on whether to allow “.” as part of the names, specifically the action script names. Historically, ignoring “.” has been traditional for run-parts, cron.d, etc, because it ignores automatically-created files such as foo.bak, foo.swp, etc. However, I acknowledge using extensions for executable scripts within a project (.sh, .py, etc) is currently popular.

My answer to this? Stop Doing That. I can’t tell you how many times I’ve seen someone (including me) put do_cool_thing.sh into a repository, and over time it gets expanded to the point it’s too complicated to be effectively managed as a shell script, and is rewritten in, say, Python. The problem is references to do_cool_thing.sh are now too entrenched, and you now have do_cool_thing.sh which is actually a Python script (!!!), or do_cool_thing.sh which is just a wrapper call to do_cool_thing (if I learned my lesson) or do_cool_thing.py (if I didn’t).

Just drop the extension when creating a script. Shebangs exist for a reason. 😉

By the way, if you do find yourself in this migration situation, here’s a general-purpose redirect script for the old location:

#!/bin/sh

exec "$(dirname "$0")/$(basename "$0" .sh)" "$@"

Want to hire me? Let's talk careers.

Summary

Hello! I am a highly experienced Linux systems engineer, looking to work with the right team. If you are in need of a senior SRE with a focus on operational development, or a developer with a focus on design for infrastructure, here’s my resume; I’d love to talk with you!

Background

I left my previous employer last year, having planned to take a several month sabbatical. In a stroke of… interesting timing, my last day was the first week of March 2020. With COVID and lockdowns and the world in turmoil, I decided to extend my sabbatical and work on a bunch of personal projects.

A year has passed and I’m ready to join the career world again.

About me

My resume contains the essential details, but the nice thing about a blog post is it allows me to be more fluid. So let’s be fluid!

My immediate previous work experience was 8 years as a Site Reliability Engineer at Canonical, the company behind Ubuntu. Canonical was an 80% remote work company (and since became 100% since COVID), and I worked within a group of about 20 SREs, supporting the company’s operations, as well as interacting directly with the open source community. While I have experience with many Linux distributions, suffice it to say, I know Ubuntu inside and out.

I’ve been doing Linux systems engineering for over 20 years, and have worked with many people across the open source world. My claim to fame is Finnix, a bootable utility Linux distribution (LiveCD), geared toward system administration, rescue and recovery, etc. I am a Debian maintainer, an Ubuntu technical member, and additionally have packaging experience with Fedora and Homebrew.

While I can pick up nearly any programming language, I describe myself as a prolific Python programmer. For a portfolio example of my current Python ability, see rf-pymods, a collection of standalone helper modules. Docstrings on each function, 100% code coverage on unit tests, tox, GitHub workflows. For a more realistic example, see 2ping, a network investigation utility which was developed in 2010 and has been updated and maintained since. Tox, CI, test framework (but not (yet) 100% coverage), reasonable code and functional documentation.

Public cloud (I wrote the caching proxy software running the per-region Ubuntu mirrors on AWS, Azure and GCE). Private cloud (nearly a decade of OpenStack experience). Containers. Continuous integration (I even have my own lightweight CI system called dsari). The list goes on. And yes, I know Git.

About you

The top consideration I have with a potential employer is a healthy remote work lifestyle. I’ve been working from home for 10 years now, and recognize the strengths (and weaknesses) of a remote work setup. COVID caused many companies to shoehorn in work-from-home into their existing business strategy on short notice, while I’ve been most pleased with companies who have remote collaboration as part of their DNA.

That being said, I’m looking for a senior SRE position which has a focus on operational development. This can also take the form of a development role which has a focus on design for infrastructure. In many ways, these are one in the same. Open source development and contribution is strongly preferred; I do most of my work in the open, and value companies which do the same.

I’ve had experience with startups and are not opposed to them, but would prefer a mid-sized established company or a late-stage startup. Industry is not as important as the people and the teams. I am based on the US west coast and have extensive experience working with geographically distributed teams.

Let’s talk

If you’re excited, here’s my resume, here’s my GitHub profile, go give Finnix a try, etc, then send me an email. I’d love to talk with you.

(Spinning) Rust begone!

33 hard drives arranged in a pyramid pattern

You could say I have a few computers… 63 as of this writing (I made a spreadsheet), though about half of those are SBCs (single board computers; Raspberry Pis and similar). However, many of the other computers are old, and include mechanical hard drives in various states of failure. About a year ago, I began a quest to eliminate as many mechanical hard drives from my collection as possible.

2.5” SATA

This one’s simple, just replace the 2.5” SATA HDD with a 2.5” SATA SSD.

3.5” SATA

3D printed 2.5 inch to 3.5 inch drive adapter with case-specific standoffs and an SSD installedAlmost as simple as 2.5” SATA, just replace with a 2.5” SATA SSD and a 2.5” to 3.5” adapter bracket. However, it took me awhile to find the ideal bracket, a 3D printed universal-ish adapter (specifically the “minimalist sunk screwholes” variant). It can mount the underside of an SSD in several positions, and has 3.5” holes for side or underside mounting.

Note that while the SATA/power ports are close to a normal 3.5” drive’s location, it’s not exact. For a situation where you need to plug into a backplane (like a server or NAS), you’ll need a caddy adapter which re-routes the ports.

3.5” IDE

SD to IDE adapter in a 3D printed PCI/ISA bracketSD to IDE adapter in a 3D printed 3.5 inch mount

These became the majority of my conversions. Since we’re talking about older IDE devices, the speed of a modern SSD isn’t needed, so SD to IDE adapters work extremely well. (The linked product is only an example; it’s a common design sold by many sellers under similar names.) I’ve had success across a range of devices, from a 486 desktop to a Sun Blade 100 workstation to a Power Mac G4 desktop.

The preferred mounting option is in a PCI / ISA slot, so you can remove the SD card and mount it on a more modern computer if needed. I designed a reinforced bracket for this purpose.

Alternatively, you can mount it in a 3.5” bay, either as an external 3.5” device (i.e. a floppy bay) or internal, if all else fails. Again, I have designed a 3D-printed mount for these situations. The same mount can either be used internally or externally, with side or underside 3.5” mounting. Note that the adapter’s IDE and power ports are nowhere near a normal hard drive’s positioning, and the top-mounted Molex port is rarely in a convenient location, so you’ll also want to get a short 40-pin extension ribbon cable, and a short floppy power to Molex adapter.

3.5” IDE (for stubborn computers)

The only computer I found which doesn’t like the SD to IDE adapter is the Power Mac G3 Blue & White. Mac OS 8.6 sees the drive, but would somehow encounter I/O timeouts any time it tried to use it. For this machine, I went with a CompactFlash card and a CF to IDE adapter. CF is a subset of IDE, so you want the adapter to be completely passive. This worked, but as CF cards are getting harder (and more expensive) to find, I wanted to use this as a last resort.

2.5” IDE

M.2 to 2.5 inch IDE adapter with M.2 drive installedThis one’s a bit weird. For older laptops and machines like the G4 Mac Mini, it needs to be the same form factor as a 2.5” IDE drive, so we’ll want to get… M.2 drives. Okay, SATA, not NVMe, but it still feels odd buying a brand new gumstick-sized card for a 15+ year old device. The secret is this M.2 SATA to 2.5” IDE converter (again, sold by multiple sellers); beyond that, you can pick any cheap M.2 SATA 2242 drive, though I recommend 120GB drives as the target device is probably old enough to be affected by the 48-bit LBA limit of 137GB.

SCSI

I have several machines which have both SCSI and IDE interfaces, and for those I just used the methods above for IDE. But I do have one SCSI-only machine: the SGI Challenge S, a server variant of the SGI Indy. For this, there is the SCSI2SD v6. Until now, the converters/adapters have all been in the $10 to $20 range, but the SCSI2SD is significantly more expensive at about $100. But it’s full-featured, and has options for just about any classic computer situation. It can divide up an SD card to emulate multiple SCSI devices, even CD/tape drives, and has software-configurable termination options. I love it, but at its price, I’m glad I only needed it once.

Again, you’ll probably need a bracket to physically mount it within the computer, but I found someone else’s SCSI2SD universal mount to work fine with the Challenge S without modification.

Gotek floppy emulator with OLED mod and 3D printed case, mounted in a Packard Bell desktopFloppy drives and disks also fail, and for this I recommend the Gotek SFR1M44-U100, which lets you use a USB thumb drive to emulate floppy disks. There are many mods you can do to this drive, but at the very least, I recommend replacing the stock firmware, which requires a proprietary utility to write images to the USB drive. Replace the firmware with FlashFloppy, which has many more emulation options than the stock firmware, and also lets you directly drop image files onto the USB drive.

What’s left?

Surely I didn’t completely eliminate spinning hard drives from my home, did I? Something I found was the closer I got to zero, the more the remaining ones stood out, and the stronger the desire to address them.

  • Some Cheap Laptop from Best Buy was a laptop I used for a few months back in 2015. I consulted the repair manual for the laptop, and getting to the 500GB HDD would have been a monumental task. The laptop itself was not useful to me, nor did I have any sentimental attachment (beyond the joke review I made 6 years ago), so I solved the problem by reinstalling Windows 8.1 on it and giving it away.
  • My primary home server has 28TB of raw HDDs, in the form of 7x 4TB drives. It would be prohibitively expensive to replace it all with SSDs. One drive is external backup and one is a Purple drive for security camera recordings, and the other 5 are in a RAID 6 setup, backed by a 500GB SSD bcache, so I’m not worried about reliability or performance.
  • My Windows gaming machine has 3 tiers of storage: 1TB of boot NVMe, 2TB of SATA SSD, and a 6TB HDD. Again, I’m not concerned.
  • My remote colocation server has two primary boot 250GB SATA SSDs, and two WD RE4 2 TB “enterprise” HDDs, each in RAID 1. I’ll likely do pure solid-state with my next colocation server, but considering the current server is only 3 years old and the one before that lasted nearly a decade, it’s probably not going to be for quite awhile.

The "perfect" IP hashing algorithm

If you want to anonymize IP addresses, for example in HTTP logs, here’s the method I recommend:

#!/usr/bin/env python3

import hashlib
import ipaddress
import sys


KEY = b"example"


def hash_ip(ip, key=b""):
    last_len = -8 if ip.version == 6 else -1
    base_bytes = ip.packed[:last_len]
    last_bytes = ip.packed[last_len:]
    base_bhash = hashlib.shake_256(key + base_bytes).digest(len(base_bytes))
    last_bhash = hashlib.shake_256(key + ip.packed).digest(len(last_bytes))
    return ipaddress.ip_address(base_bhash + last_bhash)


for line in sys.stdin:
    line_ip, line_rest = line.split(" ", 1)
    ip_hashed = hash_ip(ipaddress.ip_address(line_ip), KEY)
    sys.stdout.write("{} {}".format(ip_hashed, line_rest))

Notes:

  • IPv4 and IPv6 addresses are both supported.
  • Hashed IPv4 addresses return IPv4, and hashed IPv6 addresses return IPv6. However, that doesn’t mean the hashed addresses are globally valid (they could be within multicast, for example).
  • The last IPv4 /24 or IPv6 /64 can still be correlated, allowing you to pick out relatively similar hashed addresses. For example, if you see both 230.10.134.107 and 230.10.134.226 hashed addresses, you know they’re part of the same /24, even though you know nothing else about the addresses.
  • However, the last IPv4 /24 or IPv6 /64 is not globally hashed and does not leak information about itself. If you happen to know hashed 230.10.134.107 corresponds to real 10.9.8.4, hashed 178.59.91.107 does not mean the real IP ends in .4.
  • A hash key is optional but recommended. Reuse the key if you want to be able to correlated hashed IPs over time, or be able to hash a known real IP to search for it in the past. Use a random key and throw it away afterwards if you want it to be truly anonymous.
  • SHAKE-256 is used because it’s modern (SHA-3) and support arbitrary-length output. If you can’t use SHA-3, use something like SHA-512, HMAC it with a key, and use a portion of the output. (SHA-3 digests can securely use key prepending; other digests require HMAC.)
  • Hashed IPv6 addresses get quite unwieldy. Get used to addresses like 20f1:b413:7f5d:7720:1fe2:1bf3:38fb:620. (Exactly one hex byte out of 32 was compressible in that random example I used. Most examples have none.)

That Time I Bought Classified Military Intelligence Off eBay (NOT CLICKBAIT)

Yesterday, “Alex” posted a hilarious account of getting former Australian prime minister Tony Abbott’s passport information. The lesson of the story boils down to: how do you report a vulnerability at a national scale? Do you just call 1-800-NATIONAL-SECURITY and let them know what happened? (Actually, Australia does have a literal hotline like that, but it turned out to be useless. Seriously, before you read further here, go read “Alex”’s story.)

This story reminds me of a similar problem I had 13 years ago. But be warned, this story is completely from memory and some things may be wrong. It will also not be nearly as entertaining as “Alex”’s story.

Cobalt RaQ 2, redacted

In 2007, I bought a Cobalt RaQ 2 off eBay. Introduced in 1999, the RaQ 2 was the rackmount server variant of the more well known Cobalt Qube 2. It featured a MIPS R5000 CPU and was technically a general purpose server platform, but was most often used as a file server or proxy server. I bought it used (and long after its useful life) because I like to collect non-x86 hardware which can run Linux.

When it arrived, I powered it on, intending to look around Cobalt’s bespoke Linux distribution before wiping the drive and installing Debian. I was expecting a server restored to factory settings, but was secretly hoping for some random company’s fileserver data. Likely an out-of-business company since these sorts of sales usually come from liquidators.

Through the front panel, I was able to reset the administrator password and retrieve the configured IP information, which was… a public IP address. Huh, weird. Was it being used as a web server? Doing a whois on it revealed it was in a block owned by the DoD Network Information Center. Yes, that DoD.

By attaching a crossover Ethernet cable, I was able to log into the RaQ’s administrative web interface. I could see it was configured as a caching LAN web proxy, and its hostname ended in smil.mil. Yes, that military.

Without looking at the contents, I could tell that the drive was full of cached content, with the usual extensions (.html, .jpg, etc), though if I remember correctly, the base filenames were randomly assigned so there was no indication of the subject of each file’s content. The file dates were mostly through 2003.

That’s as far as I dug. I turned off the RaQ and pulled the hard drive. As I would come to learn, smil.mil is a domain on SIPRNet, the US government’s classified equivalent to the Internet. SIPRNet is supposed to be air-gapped; there should be no physical way to reach a machine on SIPRNet from the Internet or vice versa, and it was definitely not accessible to civilians. The fact that I had a caching proxy server used on SIPRNet was inconceivable.

(This also explains why it had a “public” IP address. If you look at visual representations of IPv4 space (buy this poster, hint hint), you’ll see DoD spaces but they don’t appear to be in use. They are in use, just not on the Internet. RFC 1918? Never heard of it. Similarly, smil.mil and sgov.gov look like Internet domains, but only resolve on SIPRnet. I’ll give the government a pass here; the IANA has no organization-specific private namespace, like .internal or .lan, and it annoys me that domains like that are not explicitly reserved. What were we talking about? Oh right, national security.)

So what should I do with this information? I could go to the press, and it would be quite a embarrassment to the administration, but I didn’t want to draw political attention to myself. Ideally I just wanted to go to someone official and say “someone messed up, here’s the server’s hard drive and who I bought it from”. But again, there isn’t really a 1-800-CLASSIFIED-LEAK.

A co-worker suggested I contact my congressman’s office to see if they could help me navigate to someone proper in the federal government. I called, they took down my information, and never called me back. I sent an email, no response. I’m sure I sounded like a crackpot, but it’s not my fault it was all true.

In the end, I gave up. The hard drive went into my document safe, just in case I got a call later, perhaps the FBI got a list of buyers from the eBay seller. But that never happened, and about 5 years later I destroyed the hard drive without loading any of its cached contents. Somehow I lost my email archives from earlier than 2010, and eBay sale records only go back about 2 years, so I can’t even find out who sold me the server back in 2007.

I still have the Cobalt RaQ 2 hardware today (minus the hard drive), and it currently runs Debian off an SD to IDE adapter. There was nothing otherwise special about the hardware, nor any asset stickers or similar, so now it’s just a weird server from an old generation.

« All posts