Ryan Finnie

ChatGPT unsettled me

Tom Scott recently put out a video where he had a “minor existential crisis” after giving ChatGPT a coding task. His conclusion was basically, this works better than it should, and that’s unsettling. After watching this, I had my own minor coding task which I decided to give to ChatGPT, and, spoiler alert, I am also unsettled.

The problem I needed to solve was I have an old Twitter bot which had automatically followed a bunch of people over the years, and I wanted to clear out those follows. As of this writing, Twitter’s API service seems to inexplicably still exist, but the single-purpose OAauth “app” associated with that account was for API v1.1, not v2, so I needed to use API v1.1 calls.

I’d done a lot of Twitter API work over the years, and a lot of that was through Python, so I was ready to kitbash something together using existing code snippets. But let’s see what ChatGPT would do if given the opportunity:

Write a script in Python to use the Twitter API v1.1 to get a list of all friends and then unsubscribe from them

And yeah, it created a correctly-formatted, roughly 25 line Python script to do this exactly. It even gave a warning that API access requires authentication, and, amusingly, that unsubscribing from all friends would affect the account’s “social reach”.

(I’m summarizing its responses here; a full chat log, including code at every step, is available at the end of this post.)

One drawback to the specific situation was it wrote the script to use the tweepy library, which I had never heard of and wasn’t sure if it was using API v1.1 (though I suspected it was, from the library function destroy_friendship(); “friendships” are verbs in v1.1 but not v2). Nonetheless, I was more familiar with requests_oauthlib and the direct API endpoints, so I just asked ChatGPT to rewrite it to use that.

Can we use the requests_oauthlib library instead of tweepy?

Sure enough, it produced exactly what I wanted, and I ended up using it for my task.

Everything beyond this was “what-ifs” to poke at ChatGPT. The first thing I noticed was it was using a less efficient API endpoint. Thinking back to Tom’s video where he realized he could simply ask ChatGPT why it did something a certain way, I realized I could simply say:

That works, but the friends/ids.json endpoint allows for 5000 results per request, versus 200 on friends/list.json as you pointed out. Let’s use friends/ids.json instead.

ChatGPT’s response was basically “yep, I agree that’s more efficient; here’s an updated script!”, utilizing the new endpoint and specifying the new 5000 user limit.

This was a subtle test for it since the endpoint I suggested is very similar to the old one, but not a drop-in replacement. You need to make a few minor changes elsewhere in the script to utilize it. ChatGPT passed this test and updated both the endpoint name and the required changes.

I’m using Python 3.5 and can’t use f-strings. Can you rewrite the code to use string format() instead?

I’m not actually using Python 3.5, but I wanted to ask this as f-strings and format() are very different looking ways to format strings. It rewrote the script correctly.

Can you package this up to run in a GitHub Actions workflow?

It gave me the Python script it had been working on up until this point, and a workflow YAML file, along with instructions on where to put them. The workflow format was correct, and, impressively, remembered my off-hand requirement of Python 3.5 from earlier (even though the context of that was about a specific feature, not Python 3.5 specifically), and made sure to use 3.5 when setting up actions/setup-python@v2. It also knew that requests and requests_oauthlib were not stdlib, so it had the workflow install them through pip.

The workflow did use the GitHub Actions secret store to correctly use the secrets and pass them as environment variable, but up until this moment, the script itself had hardcoded placeholders. I asked ChatGPT to actually use the environment variables.

The workflow file is correctly passing CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN and ACCESS_TOKEN_SECRET secrets as environment variables to twitter_unfollow.py, but twitter_unfollow.py still has the placeholder variables. Can you modify twitter_unfollow.py to use the environment variables?

Again, it did this without a problem.

In conclusion, I am impressed by ChatGPT’s performance, to the extent of being unsettled by the experience. There are many logical, legal and ethical problems with AI generation and I don’t want to minimize those concerns, but I just wanted to document my limited experience with using it for a coding exercise.

(It’s also not lost on me the irony of using current-generation AI to fiddle with the account of X11R5, an old MegaHAL Markov bot which is decidedly more ancient AI.)


Complete chat log

Here is the entire session log between ChatGPT and myself:

Write a script in Python to use the Twitter API v1.1 to get a list of all friends and then unsubscribe from them

Before proceeding with the code, please note that the Twitter API v1.1 requires authentication, which means you need to have valid credentials to access the API. Also, please be aware that unsubscribing from all friends can have significant consequences on your account’s social reach, and it should not be done lightly.

To use the Twitter API v1.1 in Python, we need to first install the tweepy library. We can use the following code to install it:

pip install tweepy

Once tweepy is installed, we can use the following script to get a list of all friends and then unsubscribe from them:

import tweepy

# set up Twitter API credentials
consumer_key = "your_consumer_key"
consumer_secret = "your_consumer_secret"
access_token = "your_access_token"
access_token_secret = "your_access_token_secret"

# authenticate with the Twitter API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# create an API object
api = tweepy.API(auth)

# get a list of all friends
friends = api.friends_ids()

# loop through the list of friends and unsubscribe from them
for friend in friends:
    api.destroy_friendship(friend)

In this script, we first set up our Twitter API credentials, then authenticate with the Twitter API using tweepy. We then create an API object and use the friends_ids() method to get a list of all friends.

Finally, we loop through the list of friends and use the destroy_friendship() method to unsubscribe from each friend. Note that this is a permanent action, so use it with caution.

Can we use the requests_oauthlib library instead of tweepy?

Yes, we can use the requests_oauthlib library instead of tweepy to interact with the Twitter API v1.1 in Python. Assuming you have valid credentials and have installed the requests_oauthlib library, here’s an example script to get a list of all friends and then unsubscribe from them using requests_oauthlib:

import requests
from requests_oauthlib import OAuth1Session

# replace with your own credentials
consumer_key = 'your consumer key'
consumer_secret = 'your consumer secret'
access_token = 'your access token'
access_token_secret = 'your access token secret'

# set up OAuth1Session
twitter = OAuth1Session(consumer_key, client_secret=consumer_secret,
                        resource_owner_key=access_token, resource_owner_secret=access_token_secret)

# get a list of all friends
url = 'https://api.twitter.com/1.1/friends/list.json'
params = {'count': 200}
response = twitter.get(url, params=params)
friend_list = response.json()['users']

# unsubscribe from each friend
for friend in friend_list:
    url = f'https://api.twitter.com/1.1/friendships/destroy.json?user_id={friend["id"]}'
    response = twitter.post(url)
    if response.status_code == 200:
        print(f'Unsubscribed from {friend["screen_name"]}')
    else:
        print(f'Failed to unsubscribe from {friend["screen_name"]}')

In this script, we first set up an OAuth1Session object with our Twitter API credentials. We then use the get method to retrieve a list of all friends (limited to 200 per request, as per the API’s limitations). We extract the list of friends from the JSON response using the json method.

We then loop through each friend in the list and use the post method to unsubscribe from each friend by making a request to the friendships/destroy.json endpoint with the friend’s ID as a parameter. We print a message indicating whether the unsubscribe request was successful or not.

That works, but the friends/ids.json endpoint allows for 5000 results per request, versus 200 on friends/list.json as you pointed out. Let’s use friends/ids.json instead.

You’re right, using the friends/ids.json endpoint would allow us to retrieve up to 5000 friend IDs per request, which would be more efficient than using the friends/list.json endpoint. Here’s an updated script that uses the friends/ids.json endpoint:

import requests
from requests_oauthlib import OAuth1Session

# replace with your own credentials
consumer_key = 'your consumer key'
consumer_secret = 'your consumer secret'
access_token = 'your access token'
access_token_secret = 'your access token secret'

# set up OAuth1Session
twitter = OAuth1Session(consumer_key, client_secret=consumer_secret,
                        resource_owner_key=access_token, resource_owner_secret=access_token_secret)

# get a list of all friend IDs
url = 'https://api.twitter.com/1.1/friends/ids.json'
params = {'count': 5000}
response = twitter.get(url, params=params)
friend_ids = response.json()['ids']

# unsubscribe from each friend
for friend_id in friend_ids:
    url = f'https://api.twitter.com/1.1/friendships/destroy.json?user_id={friend_id}'
    response = twitter.post(url)
    if response.status_code == 200:
        print(f'Unsubscribed from friend ID {friend_id}')
    else:
        print(f'Failed to unsubscribe from friend ID {friend_id}')

In this updated script, we use the friends/ids.json endpoint to retrieve a list of all friend IDs, with a maximum count of 5000 per request. We extract the list of friend IDs from the JSON response using the json method.

We then loop through each friend ID in the list and use the post method to unsubscribe from each friend by making a request to the friendships/destroy.json endpoint with the friend’s ID as a parameter. We print a message indicating whether the unsubscribe request was successful or not.

I’m using Python 3.5 and can’t use f-strings. Can you rewrite the code to use string format() instead?

Sure! Here’s the same code using string format() instead of f-strings:

import requests
from requests_oauthlib import OAuth1Session

# replace with your own credentials
consumer_key = 'your consumer key'
consumer_secret = 'your consumer secret'
access_token = 'your access token'
access_token_secret = 'your access token secret'

# set up OAuth1Session
twitter = OAuth1Session(consumer_key, client_secret=consumer_secret,
                        resource_owner_key=access_token, resource_owner_secret=access_token_secret)

# get a list of all friend IDs
url = 'https://api.twitter.com/1.1/friends/ids.json'
params = {'count': 5000}
response = twitter.get(url, params=params)
friend_ids = response.json()['ids']

# unsubscribe from each friend
for friend_id in friend_ids:
    url = 'https://api.twitter.com/1.1/friendships/destroy.json?user_id={}'.format(friend_id)
    response = twitter.post(url)
    if response.status_code == 200:
        print('Unsubscribed from friend ID {}'.format(friend_id))
    else:
        print('Failed to unsubscribe from friend ID {}'.format(friend_id))

In this updated code, we use format() to insert the friend ID into the string url and the message strings.

Can you package this up to run in a GitHub Actions workflow?

Sure, here’s how you could package the code in a GitHub Actions workflow:

  1. Create a new Python file in your repository called twitter_unfollow.py with the following content:
import requests
from requests_oauthlib import OAuth1Session

# replace with your own credentials
consumer_key = 'your consumer key'
consumer_secret = 'your consumer secret'
access_token = 'your access token'
access_token_secret = 'your access token secret'

# set up OAuth1Session
twitter = OAuth1Session(consumer_key, client_secret=consumer_secret,
                        resource_owner_key=access_token, resource_owner_secret=access_token_secret)

# get a list of all friend IDs
url = 'https://api.twitter.com/1.1/friends/ids.json'
params = {'count': 5000}
response = twitter.get(url, params=params)
friend_ids = response.json()['ids']

# unsubscribe from each friend
for friend_id in friend_ids:
    url = 'https://api.twitter.com/1.1/friendships/destroy.json?user_id={}'.format(friend_id)
    response = twitter.post(url)
    if response.status_code == 200:
        print('Unsubscribed from friend ID {}'.format(friend_id))
    else:
        print('Failed to unsubscribe from friend ID {}'.format(friend_id))
  1. Create a new file in your repository called .github/workflows/twitter_unfollow.yml with the following content:
name: Twitter Unfollow

on:
  push:
    branches: [ main ]
  schedule:
    - cron: '0 0 * * *'  # run every day at midnight

jobs:
  unfollow:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2

    - name: Set up Python 3.5
      uses: actions/setup-python@v2
      with:
        python-version: 3.5

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install requests requests_oauthlib

    - name: Unfollow Twitter friends
      run: python twitter_unfollow.py
      env:
        CONSUMER_KEY: $
        CONSUMER_SECRET: $
        ACCESS_TOKEN: $
        ACCESS_TOKEN_SECRET: $

This workflow runs the twitter_unfollow.py script every day at midnight, using Python 3.5. It installs the requests and requests_oauthlib packages using pip, and sets the CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, and ACCESS_TOKEN_SECRET environment variables using GitHub Secrets. Be sure to replace the placeholder strings in the workflow file with the actual values of your credentials.

The workflow file is correctly passing CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN and ACCESS_TOKEN_SECRET secrets as environment variables to twitter_unfollow.py, but twitter_unfollow.py still has the placeholder variables. Can you modify twitter_unfollow.py to use the environment variables?

Sure, you can modify twitter_unfollow.py to use the environment variables like this:

import os
import requests
from requests_oauthlib import OAuth1Session

# get credentials from environment variables
consumer_key = os.environ['CONSUMER_KEY']
consumer_secret = os.environ['CONSUMER_SECRET']
access_token = os.environ['ACCESS_TOKEN']
access_token_secret = os.environ['ACCESS_TOKEN_SECRET']

# set up OAuth1Session
twitter = OAuth1Session(consumer_key, client_secret=consumer_secret,
                        resource_owner_key=access_token, resource_owner_secret=access_token_secret)

# get a list of all friend IDs
url = 'https://api.twitter.com/1.1/friends/ids.json'
params = {'count': 5000}
response = twitter.get(url, params=params)
friend_ids = response.json()['ids']

# unsubscribe from each friend
for friend_id in friend_ids:
    url = 'https://api.twitter.com/1.1/friendships/destroy.json?user_id={}'.format(friend_id)
    response = twitter.post(url)
    if response.status_code == 200:
        print('Unsubscribed from friend ID {}'.format(friend_id))
    else:
        print('Failed to unsubscribe from friend ID {}'.format(friend_id))

In this updated code, we use the os.environ dictionary to get the values of the CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, and ACCESS_TOKEN_SECRET environment variables.

IPv6 Prefix Delegation on Spectrum with dhclient

A few years ago, Spectrum (a US cable company formed from the combination of Charter and Time Warner Cable) started offering IPv6 Prefix Delegation (PD) /64s. The device connected to the cable modem will normally get an individual global dynamic IPv6 address via Router Advertisement, but Prefix Delegation is essentially the ability to request an entire network to be routed to you.

I used to live in Reno in a formerly Charter network, but recently moved to Southern California in a formerly Time Warner network, so I’m confident this information applies to all Spectrum regions. The dhclient invocation should work for any provider which supports Prefix Delegation, but the lease behavior I describe is probably not universal.

Here’s the systemd dhclient6-pd.service file on my router, a Raspberry Pi 4 connected directly to the cable modem. Replace eext0 with your external interface name.

[Unit]
Description=IPv6 PD lease reservation
Wants=network-online.target
After=network-online.target
StartLimitIntervalSec=0

[Service]
Restart=always
RestartSec=30
ExecStart=/sbin/dhclient -d -6 -P -v -lf /var/lib/dhcp/dhclient6-pd.leases eext0

[Install]
WantedBy=multi-user.target

Once running, dhclient6-pd.leases should give you something like this:

default-duid "\000\001\000\001#\225\311g\000\006%\243\332{";
lease6 {
  interface "eext0";
  ia-pd 25:a3:da:7b {
    starts 1628288112;
    renew 1800;
    rebind 2880;
    iaprefix 2600:6c51:4d00:ff::/64 {
      starts 1628288112;
      preferred-life 3600;
      max-life 3600;
    }
  }
  option dhcp6.client-id 0:1:0:1:23:95:c9:67:0:6:25:a3:da:7b;
  option dhcp6.server-id 0:1:0:1:4b:73:43:3a:0:14:4f:c3:f6:90;
  option dhcp6.name-servers 2607:f428:ffff:ffff::1,2607:f428:ffff:ffff::2;
}

So now I can see that 2600:6c51:4d00:ff::/64 is routable to me, and can set up network addresses and services. dhclient could be set up to run scripts on trigger events, but in this current state it just keeps the PD reservation.

But… max-life 3600? Does that mean I’ll lose the PD if dhclient doesn’t check in within an hour? What if I have a power outage? Yes, you will lose the PD after an hour if dhclient isn’t running… for now. After a few renewals, the far end will trust that your initial PD request wasn’t a drive-by, and will up the period from 1 hour to 7 days, and dhclient6-pd.leases will look like this:

default-duid "\000\001\000\001#\225\311g\000\006%\243\332{";
lease6 {
  interface "eext0";
  ia-pd 25:a3:da:7b {
    starts 1628288112;
    renew 1800;
    rebind 2880;
    iaprefix 2600:6c51:4d00:ff::/64 {
      starts 1628288112;
      preferred-life 3600;
      max-life 3600;
    }
  }
  option dhcp6.client-id 0:1:0:1:23:95:c9:67:0:6:25:a3:da:7b;
  option dhcp6.server-id 0:1:0:1:4b:73:43:3a:0:14:4f:c3:f6:90;
  option dhcp6.name-servers 2607:f428:ffff:ffff::1,2607:f428:ffff:ffff::2;
}
lease6 {
  interface "eext0";
  ia-pd 25:a3:da:7b {
    starts 1628291743;
    renew 300568;
    rebind 482008;
    iaprefix 2600:6c51:4d00:ff::/64 {
      starts 1628291743;
      preferred-life 602968;
      max-life 602968;
    }
  }
  option dhcp6.client-id 0:1:0:1:23:95:c9:67:0:6:25:a3:da:7b;
  option dhcp6.server-id 0:1:0:1:4b:73:43:3a:0:14:4f:c3:f6:90;
  option dhcp6.name-servers 2607:f428:ffff:ffff::1,2607:f428:ffff:ffff::2;
}

(The last lease6 is the most recent lease received.)

As far as I can tell, this 7 day PD can be renewed indefinitely; I was using the same network for nearly 2 years. But be warned: max-life is final. If you have a misconfiguration and dhclient doesn’t check in for a week, Spectrum will release your PD immediately after 7 days and your client will receive a completely new /64.


Since this is fresh in my mind from setting up my new home, here are a few things to set up on your core router, but this is not meant to be an exhaustive IPv6 Linux router guide.

The external interface automatically gets a global dynamic v6 address; as for the internal interface, while you technically don’t need a static address thanks to link-local routing, in practice you should give it one. Here’s my /etc/systemd/network/10-eint0.network:

[Match]
Name=eint0

[Network]
Address=10.9.8.1/21
Address=2600:6c51:4d00:ff::1/64
Address=fe80::1/128
IPv6AcceptRA=false
IPForward=true

You’ll also want an RA daemon for the internal network. My /etc/radvd.conf:

interface eint0 {
  IgnoreIfMissing on;
  MaxRtrAdvInterval 2;
  MinRtrAdvInterval 1.5;
  AdvDefaultLifetime 9000;
  AdvSendAdvert on;
  AdvManagedFlag on;
  AdvOtherConfigFlag on;
  AdvHomeAgentFlag off;
  AdvDefaultPreference high;
  prefix 2600:6c51:4d00:ff::/64 {
    AdvOnLink on;
    AdvAutonomous on;
    AdvRouterAddr on;
    AdvValidLifetime 2592000;
    AdvPreferredLifetime 604800;
  };
  RDNSS 2600:6c51:4d00:ff::1 {
  };
};

And a DHCPv6 server. My /etc/dhcp/dhcpd6.conf, providing information about DNS and DHCP-assigned addressing (in addition to the RA autoconfiguration):

default-lease-time 2592000;
preferred-lifetime 604800;
option dhcp-renewal-time 3600;
option dhcp-rebinding-time 7200;
allow leasequery;
option dhcp6.preference 255;
option dhcp6.rapid-commit;
option dhcp6.info-refresh-time 21600;
option dhcp6.name-servers 2600:6c51:4d00:ff::1;
option dhcp6.domain-search "snowman.lan";

subnet6 2600:6c51:4d00:ff::/64 {
  range6 2600:6c51:4d00:ff::c0c0:0 2600:6c51:4d00:ff::c0c0:ffff;
}

host workstation {
  host-identifier option dhcp6.client-id 00:01:00:01:21:37:85:10:01:23:45:ab:cd:ef;
  fixed-address6 2600:6c51:4d00:ff::2;
}

The Repository Run-Parts CI Directory (RRPCID) specification

Years ago, I wrote dsari (“Do Something And Record It”), a lightweight CI system. This was prompted by administering Jenkins installations for multiple development groups at the time, each environment having increasingly specialized (and often incompatible) plugins layered onto the core functionality.

This led me to take the opposite approach. I made a CI system based on one executable (usually a script) per job, and the assumption that you, the CI job developer, know exactly what functionality you want. Want custom notifications? Write it into the script. Sub-job triggers based on the result of the run? You can totally do that. Remote agents? Bah, just tell the script to ssh to a remote system based on the concurrency group the run is currently in. dsari’s acronym was a light-hearted take on this simplicity.

Fast forward to now, and GitHub’s CI has quickly become ubiquitous. But before that, Travis essentially pushed the idea of in-repository CI definitions, as opposed to a CI job being built around the repository as in Jenkins. As an example, finnix-live-build has a GitHub workflow which makes a test build, but I also have multiple dsari instances at home, for different architectures, doing the same thing on schedule.

However, the dsari job script merely replicates the build process of the GitHub workflow. If I add new functionality to the GitHub workflow, I need to also update build scripts on 5 different machines. This would be great to move in-repository, but I quickly found there is no established general-purpose in-repository CI layout.

So let’s make one!

If the closest simplification to the Jenkins CI model is cron, the closest simplification to the GitHub CI model is run-parts. However, since run-parts has different functionality on different systems (with Debian’s implementation currently being the most versatile), the “Run-Parts” part of the RRPCID acronym is in spirit only (though you could use Debian’s run-parts --exit-on-error for the job processing part of the RRPCID logic).

Here’s the specification I came up with:

  • A workflow is a collection of jobs, and is a readable directory under .rrpcid/workflows/.
  • A job is a collection of actions, and is a readable directory under ${workflow_dir}/jobs/.
  • An action is an executable file under ${job_dir}/. In theory this can be anything, but is likely to be a shell script.
  • Actions are executed with the repository as the current working directory.
  • Actions are executed with the environment variable CI=true. Other environment variables may be passed in from the underlying CI manager.
  • Actions are executed in lexical sort order within the job directory.
  • Workflow, job and action names must only contain letters a through z and A through Z, numbers 0 through 9, and characters “-“ (dash) and “_” (underscore). Note specifically the lack of “.” (period).
  • A repository may have multiple workflows, a workflow may have multiple jobs, and a job may have multiple actions.
  • If an action exits with a status other than 0, further actions in a job are skipped.
  • All jobs in a workflow are run, regardless of whether other jobs’ actions have failed.
  • Actions within the job must not assume another job has previously run.
  • An implicit, unnamed workflow lives directly in .rrpcid/.
  • Whether all, some or none of the repository’s workflows are run is up to the CI manager; that logic is outside the scope of this specification.
  • All other files and directories are ignored. For example, a directory named .rrpcid/testdata/ is outside the scope of this specification, and would not be handled.
  • A recommended directory for generated artifacts is artifacts/ within the workflow directory, but a CI manager is not required to do anything with this.

A script layout utilizing multiple workflows (including the implicit unnamed workflow) and multiple jobs might look like this:

.rrpcid/jobs/ci/action_1
.rrpcid/jobs/ci/action_2
.rrpcid/jobs/lint/run-lint
.rrpcid/workflows/deploy/jobs/env1/deploy
.rrpcid/workflows/deploy/jobs/env2/deploy
.rrpcid/workflows/deploy/jobs/archive/01tar
.rrpcid/workflows/deploy/jobs/archive/02upload

The following shell code would satisfy the above requirements, assuming it’s being run from dash/bash (both sort “*” matches which are needed for the job directory; other Bourne shells may not). It satisfies the requirements, but is by no means the only way to implement an RRPCID processor.

export CI=true
run_workflow() {
    for job_dir in "${1}/jobs"/*; do
        [ -d "${job_dir}" ] || continue
        [ -x "${job_dir}" ] || continue
        [ -z "$(basename "${job_dir}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
        for action in "${job_dir}"/*; do
            [ -x "${action}" ] || continue
            [ -z "$(basename "${action}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
            "${action}" || break
        done
    done
}

run_workflow .rrpcid
for workflow_dir in .rrpcid/workflows/*; do
    [ -d "${workflow_dir}" ] || continue
    [ -x "${workflow_dir}" ] || continue
    [ -z "$(basename "${workflow_dir}" | sed 's/[a-zA-Z0-9_-]//g')" ] || continue
    run_workflow "${workflow_dir}"
done

finnix-live-build now has a simple RRPCID job, though as of this writing I have not yet switched over the home dsari jobs to utilize it.


Side note / rant: I went back and forth on whether to allow “.” as part of the names, specifically the action script names. Historically, ignoring “.” has been traditional for run-parts, cron.d, etc, because it ignores automatically-created files such as foo.bak, foo.swp, etc. However, I acknowledge using extensions for executable scripts within a project (.sh, .py, etc) is currently popular.

My answer to this? Stop Doing That. I can’t tell you how many times I’ve seen someone (including me) put do_cool_thing.sh into a repository, and over time it gets expanded to the point it’s too complicated to be effectively managed as a shell script, and is rewritten in, say, Python. The problem is references to do_cool_thing.sh are now too entrenched, and you now have do_cool_thing.sh which is actually a Python script (!!!), or do_cool_thing.sh which is just a wrapper call to do_cool_thing (if I learned my lesson) or do_cool_thing.py (if I didn’t).

Just drop the extension when creating a script. Shebangs exist for a reason. 😉

By the way, if you do find yourself in this migration situation, here’s a general-purpose redirect script for the old location:

#!/bin/sh

exec "$(dirname "$0")/$(basename "$0" .sh)" "$@"

Want to hire me? Let's talk careers.

Summary

Hello! I am a highly experienced Linux systems engineer, looking to work with the right team. If you are in need of a senior SRE with a focus on operational development, or a developer with a focus on design for infrastructure, here’s my resume; I’d love to talk with you!

Background

I left my previous employer last year, having planned to take a several month sabbatical. In a stroke of… interesting timing, my last day was the first week of March 2020. With COVID and lockdowns and the world in turmoil, I decided to extend my sabbatical and work on a bunch of personal projects.

A year has passed and I’m ready to join the career world again.

About me

My resume contains the essential details, but the nice thing about a blog post is it allows me to be more fluid. So let’s be fluid!

My immediate previous work experience was 8 years as a Site Reliability Engineer at Canonical, the company behind Ubuntu. Canonical was an 80% remote work company (and since became 100% since COVID), and I worked within a group of about 20 SREs, supporting the company’s operations, as well as interacting directly with the open source community. While I have experience with many Linux distributions, suffice it to say, I know Ubuntu inside and out.

I’ve been doing Linux systems engineering for over 20 years, and have worked with many people across the open source world. My claim to fame is Finnix, a bootable utility Linux distribution (LiveCD), geared toward system administration, rescue and recovery, etc. I am a Debian maintainer, an Ubuntu technical member, and additionally have packaging experience with Fedora and Homebrew.

While I can pick up nearly any programming language, I describe myself as a prolific Python programmer. For a portfolio example of my current Python ability, see rf-pymods, a collection of standalone helper modules. Docstrings on each function, 100% code coverage on unit tests, tox, GitHub workflows. For a more realistic example, see 2ping, a network investigation utility which was developed in 2010 and has been updated and maintained since. Tox, CI, test framework (but not (yet) 100% coverage), reasonable code and functional documentation.

Public cloud (I wrote the caching proxy software running the per-region Ubuntu mirrors on AWS, Azure and GCE). Private cloud (nearly a decade of OpenStack experience). Containers. Continuous integration (I even have my own lightweight CI system called dsari). The list goes on. And yes, I know Git.

About you

The top consideration I have with a potential employer is a healthy remote work lifestyle. I’ve been working from home for 10 years now, and recognize the strengths (and weaknesses) of a remote work setup. COVID caused many companies to shoehorn in work-from-home into their existing business strategy on short notice, while I’ve been most pleased with companies who have remote collaboration as part of their DNA.

That being said, I’m looking for a senior SRE position which has a focus on operational development. This can also take the form of a development role which has a focus on design for infrastructure. In many ways, these are one in the same. Open source development and contribution is strongly preferred; I do most of my work in the open, and value companies which do the same.

I’ve had experience with startups and are not opposed to them, but would prefer a mid-sized established company or a late-stage startup. Industry is not as important as the people and the teams. I am based on the US west coast and have extensive experience working with geographically distributed teams.

Let’s talk

If you’re excited, here’s my resume, here’s my GitHub profile, go give Finnix a try, etc, then send me an email. I’d love to talk with you.

(Spinning) Rust begone!

33 hard drives arranged in a pyramid pattern

You could say I have a few computers… 63 as of this writing (I made a spreadsheet), though about half of those are SBCs (single board computers; Raspberry Pis and similar). However, many of the other computers are old, and include mechanical hard drives in various states of failure. About a year ago, I began a quest to eliminate as many mechanical hard drives from my collection as possible.

2.5” SATA

This one’s simple, just replace the 2.5” SATA HDD with a 2.5” SATA SSD.

3.5” SATA

3D printed 2.5 inch to 3.5 inch drive adapter with case-specific standoffs and an SSD installedAlmost as simple as 2.5” SATA, just replace with a 2.5” SATA SSD and a 2.5” to 3.5” adapter bracket. However, it took me awhile to find the ideal bracket, a 3D printed universal-ish adapter (specifically the “minimalist sunk screwholes” variant). It can mount the underside of an SSD in several positions, and has 3.5” holes for side or underside mounting.

Note that while the SATA/power ports are close to a normal 3.5” drive’s location, it’s not exact. For a situation where you need to plug into a backplane (like a server or NAS), you’ll need a caddy adapter which re-routes the ports.

3.5” IDE

SD to IDE adapter in a 3D printed PCI/ISA bracketSD to IDE adapter in a 3D printed 3.5 inch mount

These became the majority of my conversions. Since we’re talking about older IDE devices, the speed of a modern SSD isn’t needed, so SD to IDE adapters work extremely well. (The linked product is only an example; it’s a common design sold by many sellers under similar names.) I’ve had success across a range of devices, from a 486 desktop to a Sun Blade 100 workstation to a Power Mac G4 desktop.

The preferred mounting option is in a PCI / ISA slot, so you can remove the SD card and mount it on a more modern computer if needed. I designed a reinforced bracket for this purpose.

Alternatively, you can mount it in a 3.5” bay, either as an external 3.5” device (i.e. a floppy bay) or internal, if all else fails. Again, I have designed a 3D-printed mount for these situations. The same mount can either be used internally or externally, with side or underside 3.5” mounting. Note that the adapter’s IDE and power ports are nowhere near a normal hard drive’s positioning, and the top-mounted Molex port is rarely in a convenient location, so you’ll also want to get a short 40-pin extension ribbon cable, and a short floppy power to Molex adapter.

3.5” IDE (for stubborn computers)

The only computer I found which doesn’t like the SD to IDE adapter is the Power Mac G3 Blue & White. Mac OS 8.6 sees the drive, but would somehow encounter I/O timeouts any time it tried to use it. For this machine, I went with a CompactFlash card and a CF to IDE adapter. CF is a subset of IDE, so you want the adapter to be completely passive. This worked, but as CF cards are getting harder (and more expensive) to find, I wanted to use this as a last resort.

2.5” IDE

M.2 to 2.5 inch IDE adapter with M.2 drive installedThis one’s a bit weird. For older laptops and machines like the G4 Mac Mini, it needs to be the same form factor as a 2.5” IDE drive, so we’ll want to get… M.2 drives. Okay, SATA, not NVMe, but it still feels odd buying a brand new gumstick-sized card for a 15+ year old device. The secret is this M.2 SATA to 2.5” IDE converter (again, sold by multiple sellers); beyond that, you can pick any cheap M.2 SATA 2242 drive, though I recommend 120GB drives as the target device is probably old enough to be affected by the 48-bit LBA limit of 137GB.

SCSI

I have several machines which have both SCSI and IDE interfaces, and for those I just used the methods above for IDE. But I do have one SCSI-only machine: the SGI Challenge S, a server variant of the SGI Indy. For this, there is the SCSI2SD v6. Until now, the converters/adapters have all been in the $10 to $20 range, but the SCSI2SD is significantly more expensive at about $100. But it’s full-featured, and has options for just about any classic computer situation. It can divide up an SD card to emulate multiple SCSI devices, even CD/tape drives, and has software-configurable termination options. I love it, but at its price, I’m glad I only needed it once.

Again, you’ll probably need a bracket to physically mount it within the computer, but I found someone else’s SCSI2SD universal mount to work fine with the Challenge S without modification.

Gotek floppy emulator with OLED mod and 3D printed case, mounted in a Packard Bell desktopFloppy drives and disks also fail, and for this I recommend the Gotek SFR1M44-U100, which lets you use a USB thumb drive to emulate floppy disks. There are many mods you can do to this drive, but at the very least, I recommend replacing the stock firmware, which requires a proprietary utility to write images to the USB drive. Replace the firmware with FlashFloppy, which has many more emulation options than the stock firmware, and also lets you directly drop image files onto the USB drive.

What’s left?

Surely I didn’t completely eliminate spinning hard drives from my home, did I? Something I found was the closer I got to zero, the more the remaining ones stood out, and the stronger the desire to address them.

  • Some Cheap Laptop from Best Buy was a laptop I used for a few months back in 2015. I consulted the repair manual for the laptop, and getting to the 500GB HDD would have been a monumental task. The laptop itself was not useful to me, nor did I have any sentimental attachment (beyond the joke review I made 6 years ago), so I solved the problem by reinstalling Windows 8.1 on it and giving it away.
  • My primary home server has 28TB of raw HDDs, in the form of 7x 4TB drives. It would be prohibitively expensive to replace it all with SSDs. One drive is external backup and one is a Purple drive for security camera recordings, and the other 5 are in a RAID 6 setup, backed by a 500GB SSD bcache, so I’m not worried about reliability or performance.
  • My Windows gaming machine has 3 tiers of storage: 1TB of boot NVMe, 2TB of SATA SSD, and a 6TB HDD. Again, I’m not concerned.
  • My remote colocation server has two primary boot 250GB SATA SSDs, and two WD RE4 2 TB “enterprise” HDDs, each in RAID 1. I’ll likely do pure solid-state with my next colocation server, but considering the current server is only 3 years old and the one before that lasted nearly a decade, it’s probably not going to be for quite awhile.
« All posts