Ryan Finnie

I am the Simone Giertz of high-performance computing

It all began with a tweet.

I too spent my late teens and early 20s thinking clusters were the future. I gawked at a friend who worked on an SGI Altix in college. I wanted a Beowulf, whatever those were. Itanium! Blades! Infiniband!

And yet somehow I failed to notice over time that clusters WERE the future, and that they have become my career. I build and maintain million-dollar clusters with thousands of instances running parallel scale-out workloads. I think what caught me off guard is that they’re called “clouds” now.

Anyway, the Raspberry Pi Cluster Hat allows you to cluster up to four Raspberry Pi Zeros to a regular Raspberry Pi. I can’t think of a single instance where this would be useful. Zeros are amazingly slow; a cluster of four combined have half the CPU performance of a single Pi 2, not even to say the 3, 3+ or 4. The original Zero didn’t have any sort of networking so this would allow for communication (via USB gadget mode), but the Zero W has built-in Wifi. Even if you already had a regular Raspberry Pi and four original Zeros and wanted to combine them together, the hat costs $50, whereas a brand new Pi 4 would be much faster and only $35 (plus dongles).

The Cluster Hat is a product which should not exist.

I love that it does exist.

I ordered one.

I also ordered four Zero Ws. Problem is while they’re theoretically $10 each, they’re very hard to find individually and always have strict one per person limits. So I bought four kits at $25 each even though I didn’t have any need for the included accessories. And four $7 32GB MicroSD cards. And the hat itself. Including the original purchase of the Pi 2 several years ago, this all adds up to $170.

With the parts on the way (the hat was shipped from the UK and took about two weeks to arrive), I pondered what to do with the cluster. It had to do something; creative irrelevance needs to have a grain of relevance to be amusing. I did know I wanted to have some sort out on-device output, so I also bought a $25 2.13” e-paper hat to attach to the outermost Zero. Running total: $195.

By the time the parts arrived, I had written the code for the cluster. I now have an overcomplicated random number generator.

Raspberry Pi Cluster Hat random number generator

TrueRand is a topic I’ve written about a few times here. It’s a software-based hardware random number generation technique which relies on the unpredictable interaction between a computer’s CPU and RTC. Basically, a bit is flipped for a certain amount of time, recorded, and debiased. I used my own software, twuewand, which tries to target a certain number of flips (40,000 by default) but will rarely get exactly 40,000, so this actually makes generation scale with CPU power.

After debiasing, the four nodes combined produce about 5 bits per second. Yes, bits. Per second. This works out to 4 bytes per 6.4 seconds, which are displayed on the e-paper display. These four nodes’ outputs are collected on the main Pi, which sends commands back to the 4th node for what to render on the e-paper display.

I now have a miniature space heater which conveniently gives me truly(-ish) random bytes at a glance.

I could have used the Raspberry Pi 2 to run twuewand (which automatically uses all cores available on a single node) to get about 12 bits per second. Or 240 bits per second on my laptop. Or I could have used the SOC true-hardware RNG on the Pi 2 which seems to do about 1 megabit per second.

But then where’s the fun in that?

Monoprice Maker Select Plus 3D printer mods

Monoprice Maker Select Plus 3d printer

About a month ago I bought a 3D printer, the Monoprice Maker Select Plus. This is a rebrand of the Wanhao Duplicator i3 Plus, and is also rebranded by several other manufacturers, including Aldi supermarkets in Australia. Yes, really.

3D printers run a wide range, from “build the frame yourself and buy a hundred off-the-shelf parts”, to kits which include all the parts but require full assembly, to “spend a few grand, plug it in and turn it on”.

I picked the Monoprice model because it’s nearly fully assembled, requiring about 15 minutes of assembly to attach the two main components together, and it is well reviewed as producing decent prints out of the box. This is an important consideration for a first 3D printer, and I was very lucky to have my first few prints go perfectly, so I had an idea what the process should look like, to compare when things are going wrong.

And they will go wrong. No 3D printer will be completely foolproof, and all require various levels of troubleshooting. For all that I’ve learned in the last month, I feel confident that if I buy another printer, a kit would be easy and a completely-from-scratch build would be possible. (From what I’ve seen, 3D printers are like cats: people who have more than zero usually have more than one. Some even have their houses overrun by them.)

The second important factor in choosing the Monoprice is there is a lot of potential for customization, with a large community of Wanhao i3 owners. And oh boy, have I modded it in the last month. Strictly speaking, none of what I’ve done below is necessary, but this is a hobby, and all of it was fun.

  • Printed a filament guide arm just below the spool holder. This was actually my first “mod” and was done with the sample black filament which came with the printer (and I used much of it).
  • Added a Z brace, which helps avoid movement of the vertical frame and theoretically reduces the chance of ghosting on prints. It also allows you to minutely adjust the torsion flex of the frame as a whole. This is one of the most impressive cost-to-looks ratio mods, and consisted of a 1 meter threaded rod ($3) split in two, about $3 worth of nuts and bolts, and a large amount of printed parts. The corners also double as a larger base to attach rubber or cork feet.
  • Printed a lenth extension to the spool holder arm. The spool holder which comes with the printer isn’t wide enough to fit most common spools, which I think is one of the few outright flaws of the i3 Plus (albeit a small and easily corrected flaw).
  • Printed an LCD extension panel, which tilts the viewing angle up slightly, and allows access to the LCD’s internal diagnostic MicroSD card (see below).
  • Replaced the printer’s firmware with ADVi3++. The original firmware was decently capable, but was based on an older version of Marlin. ADVi3++ is based on the latest version, has extra features such as guides for filament length adjustment, and quality of life improvements such as temperature readings on the main menu. This involved upgrading the firmware on the internal main board itself via USB, as well as upgrading the LCD’s firmware via a MicroSD slot on the side of the LCD. (Yes, the LCD has its own microcontroller.)
    • Update (2019-06-16): I am no longer recommending ADVi3++ as, while the source code is still open source, the author is now charging for the firmware binaries as well as much of the documentation, including how to compile the source. The author is within his rights to do this, but I disagree with it.
  • Replaced the cold block fan (which prevents the molten filament in the hot end from flowing back up and jamming) with a direct replacement. The original one started making a loud noise after a few weeks of use, and is a known problem. Thankfully direct replacements are a few dollars on Amazon.
  • Replaced the 40mm part cooling fan with a 50mm blower fan. There are many more efficient part cooler mods such as the DiiiCooler or the CiiiCooler which evenly distribute the air around the nozzle, but have tradeoffs such as visibility and clearance issues. I’ve found simply printing an adapter shroud for the front slot works fine for my needs.
  • Replaced the original 4-point corner bed leveling system with a 3-point bed level (two on the left corners, one on the middle of the right side). This allows for more accurate adjustment, since technically it’s impossible to adjust a flat plane using four points (you end up warping it into a 3-dimensional object).
  • Added BLTouch bed leveling. This consists of a probe mounted as close to the nozzle as possible. The probe can sense with a high level of precision when it touches the surface, and reports this data at various parts of the bed to the firmware, which can correct for different heights on the bed’s plane. (It’s still also a good idea to start with a decent attempt at manual leveling.)
  • Replaced the BuildTak-like surface with a glass print surface, with a large (but incredibly thin) thermal pad between the heated plate and the glass.
  • Added four binder clips to the edges to prevent the glass from separating from the plate. Not the most exciting mod, but it’s worth pointing out since you need to account for them when doing low-height print head travels. I’ve got some thinner clips ordered, but they haven’t arrived yet.
  • Replaced the thin Y carriage with an all-aluminum replacement, allowing for a more stable Y axis.
  • Did some of my own firmware mods based on ADVi3++. In particular, the BLTouch sensor’s hardware development is moving fast and the version I received (v3) wasn’t compatible with the support in ADVi3++ (v2), so I backported v3 support from the Marlin development branch.
  • Set up a Raspberry Pi with OctoPrint, a USB printer manager. You could just transfer GCODE to an SD card and print it directly from the printer, but OctoPrint gives you more convenience and flexibility. After I slice an object, I can tell Cura to send the GCODE directly to OctoPrint which starts the print. And I’ve got a webcam pointed at the print bed, which OctoPrint shows and lets me monitor the print when I’m away from it, and it also captures a per-layer timelapse of the print.
  • The printer’s main board is a variation of a reference Arduino platform, and an annoying side effect is it can be powered from the USB port. I don’t want this, since the USB host is a Raspberry Pi which has its own limited power to deal with, and also means that when I turn off the printer via the back power switch, the main board and LCD remain on (but have no control over the motors, heaters, etc). I solved this by taking a USB cable, stripping off the sheath in the middle and cutting the red wire. This turns it into a “data only” USB cable, so the printer turns off completely when the power switch is turned off. Interestingly, I have yet to find a commercially sold “data only” cable, though obviously the opposite “charging only” is common.

So you want a Stratum 1 NTP server...

(Standard warning: I’d consider myself an informed amateur in this field, so don’t take anything I say as gospel.)

MECCA GPS time receiver

The Global Positioning System is an amazing piece of technology. In very simple terms, GPS is a constellation of moving satellites which simply broadcast when they are, and rely on you to know where they can be in the sky. Picking up one satellite by itself determines your position to within a hemisphere of Earth. Two satellites reduce that down to a radial line. Three pinpoint you to a specific latitude and longitude (2D fix). And four or more let you pinpoint elevation (3D fix).

To do this with any sort of accuracy, the clocks onboard the satellites must be extremely accurate. GPS atomic clocks use cesium (or more recently, rubidium) as an oscillator, and are theoretically accurate to within 14 nanoseconds. So great, you have a free source of high-precision timekeeping! Just hook up a consumer GPS receiver to your computer and you have a Stratum 1 device, right?

There’s a small problem, in that there’s a lot of uncertainty in getting from the receiver to your computer. The receiver may say “the current time is 01:23:45.000 UTC”, but there’s a (relatively) massive amount of time between “the” and “UTC”. When was it 01:23:45.000 UTC?

The solution is PPS, or 1PPS. It’s simply a high precision pulse, once per second, at the same time (or as close as can be) every second. Usually the PPS pulse is at the top of the second, and then the receiver has the rest of the second to send its data to the computer.

Many modern GPS receiver chipsets support PPS, but almost no consumer devices support exporting it. Take apart nearly 100% of the “mouse” style u-blox USB receivers, and you will often see a PPS solder pad, but nothing to attach to it. The embedded USB TTL chipset just doesn’t support it.

USB GPS PPS receiver (breadboard)But you can build your own. The FT232R USB chipset has support for emulating all RS-232-style serial control signals, including Data Carrier Detect. DCD’s classic meaning in serial communication is basically “I’m ready to begin sending you something”, so the logic maps well to the PPS concept. This post by Larry Cochrane explains how to pair a $15 GPS receiver module with a $10 FT232R-based USB TTL adapter to get a USB GPS receiver with PPS support. This was the first design I built.

This is fine for a home NTP receiver, but there is a problem. USB is laggy, in the realm of local high-precision timekeeping. (Still better than you could do from an Internet NTP source.) Worse still, it’s unpredictably laggy, i.e. jitter. USB 1 and 2 are packet-based, and it’s not guaranteed packets will arrive over the bus in the exact same amount of time. The average latency from the receiver to the software may be 200 microseconds, but the jitter may be ± 300 microseconds.

If we want to get serious, we need to go old school. RS-232 serial is interrupt-based. An electrical signal comes in, the CPU pauses whatever it’s doing to read it. (In a tiny amount of time; the amount of time lost processing the interrupt is insignificant for modern computers.) And amazingly, the beast of a home server I built late last year — a Ryzen 7 2700X with 64GB memory and 9 hard drives — still has a motherboard with an RS-232 header on it.

Let’s take a moment to discuss RS-232 and TTL. TTL is the language of nearly all modern serial components (such as the GPIO pins on your Raspberry Pi). A one is 3.3V (high), a zero is 0V (low). Fine for high-ish-speed, physically short runs of a few inches. But RS-232 is what home computers used to use for serial communication to devices such as modems. With RS-232, high is anywhere between 3V and 25V, and low is -3V to -25V. Signals between -3V and 3V are discarded as garbage. This was useful for reliable longer-length cable runs.

So we can’t just solder our GPS receiver (TTL) to a DB9 port and plug it into a computer. We need something to actively translate between TTL and RS-232 voltages. There are plenty of TTL to RS-232 DB9 adapters on Amazon and eBay. The problem is they’re almost all based on the MAX232 chipset or similar, which only supports two drivers (transmit from the TTL device to the PC) and two receivers (receive from the PC to the TTL device). 75% of these adapters will implement TX and RX only, and the other 25% will also support CTS and RTS, but no DCD.

The MAX3238 chipset supports five drivers and three receivers, and is designed specifically for full RS-232 translation. As far as I’ve been able to find, there is exactly one manufacturer currently making breakout boards based on the MAX3238: the Pololu 23201a for about $10.

So now, let me present to you MECCA: Measuring Expensive Cesium Clocks in the Air.

MECCA GPS time receiver (open case)

The RS-232 board is the Pololu 23201a. The USB board is a random cheap micro USB header to supply 5V to the GPS receiver, which is an Adafruit Ultimate GPS Breakout. The Adafruit receiver is quite expensive, about $40, but I chose it because it has an automatic external antenna header, something I didn’t find on any of the $15 eBay modules. But if you’re in a relatively open area, the $15 modules with their built-in antennas should be fine.

The receiver has a 3.3V regulator which drives the TTL logic, but also powers the RS-232 board. It’s a pretty simple schematic; the hardest part was soldering the various VCC bridges (needed to keep DSR, CTS and RI high) on the breadboard connecting the RS-232 and USB boards to fit in the project box. (I left the GPS receiver modular since it’s the most expensive component of the setup.)

MECCA 1.0 schematic

One thing to note if you go for the $15 bare module: The “enable” pin must also be tied to VCC to enable it. On the Adafruit module, “enable” is pulled high by default and can be tied to ground to disable it.

If you wanted to go more versatile, you could build something which supports USB or RS-232:

MECCA 2.0 schematic

FT232R-based USB converters support “PWREN”, which is high when the converter has power, but goes low when the converter actually has USB communication with a host. This can switch a P-channel MOSFET to VCC on the RS-232 converter, so it’s only active when there’s simple USB power. The only reason I didn’t build this is because fitting it all in a small project box would be tight (and I didn’t happen to have a suitable MOSFET at the time).

Now that you have a PPS-capable receiver, you’ll need gpsd which will interface with ntpd. This gpsd page goes into excruciating detail about the theory and the operation, so I won’t go into the details here. My final ntpd configuration is the standard ntp.org pool config (ntpd works best with other peers to act as sanity checks), plus the following:

# GPS Serial data reference (NTP0)
server 127.127.28.0 minpoll 0 maxpoll 0 noselect
fudge 127.127.28.0 time1 0.521643 refid GPS

# GPS PPS reference (NTP1)
server 127.127.28.1 minpoll 0 maxpoll 0 prefer
fudge 127.127.28.1 refid PPS

I wouldn’t worry too much about fine-tuning the GPS reference. My offset (“time1”) is 521.643ms, but the jitter can be up to 60ms, so I’ve explicitly told ntpd to track it but not to consider it (“noselect”). Even if I didn’t explicitly specify “noselect”, it would eventually work out that the jitter is garbage and ignore it (with an “x” in “ntpq -p”). The kernel PPS feed is the high-precision component, and has an average jitter of about 7 microseconds.

Weighted file cleanup

I’ve got four security cameras streaming to my home server, and save the most recent 3 TiB of raw recordings locally. I won’t go too much into the details (because they’re out to get me, of course), but cam2 and cam4 are the main ones, and I would like to save them locally the longest. However, cam4’s file sizes are larger than cam2’s. cam1 is in an unimportant area so I don’t want to save as much, and cam3 is only used on demand, so it has an even lower priority.

I struggled with a system for deleting old recordings, but recently came up with what I believe is a decent system. I assign file globs into per-camera collections, and then assign a weight to each collection. cam2 and cam4 each get a weight of 1.0, cam1 gets 0.2 and cam3 gets 0.1.

It then goes through what can be multiple rounds, determining if each collection is over or under its weighted share of the total target. If a collection is under target, it is eliminated from the round, but importantly its usage is removed from the considered total target for the next round. Rounds continue until all collections in the round are over target.

cam1: 3809 files, 364.82 GiB, 0.20 weight
cam2: 8226 files, 809.57 GiB, 1.00 weight
cam3: 529 files, 83.86 GiB, 0.10 weight
cam4: 7514 files, 1817.40 GiB, 1.00 weight
Grand total: 3075.66 GiB used, 3072.00 GiB target, 3.66 GiB above target
Round 1: cam1: 364.82 GiB used, 267.13 GiB target, 8.70% round weight of 3072.00 GiB, 97.69 GiB above target
Round 1: cam2: 809.57 GiB used, 1335.65 GiB target, 43.48% round weight of 3072.00 GiB, -526.08 GiB above target (disqualifying)
Round 1: cam3: 83.86 GiB used, 133.57 GiB target, 4.35% round weight of 3072.00 GiB, -49.70 GiB above target (disqualifying)
Round 1: cam4: 1817.40 GiB used, 1335.65 GiB target, 43.48% round weight of 3072.00 GiB, 481.75 GiB above target
Round 2: cam1: 364.82 GiB used, 363.09 GiB target, 16.67% round weight of 2178.57 GiB, 1.73 GiB above target
Round 2: cam4: 1817.40 GiB used, 1815.47 GiB target, 83.33% round weight of 2178.57 GiB, 1.93 GiB above target
cam1: Freeing 1.73 GiB
Deleting /media/camera/streams/cam1/cam1_XXX.mp4 (175.85 MiB)
Deleting /media/camera/streams/cam1/cam1_XXX.mp4 (102.45 MiB)
Deleting /media/camera/streams/cam1/cam1_XXX.mp4 (103.56 MiB)
[...]
cam4: Freeing 1.93 GiB
Deleting /media/camera/streams/cam4/cam4_XXX.mp4 (844.88 MiB)
Deleting /media/camera/streams/cam4/cam4_XXX.mp4 (839.17 MiB)
Deleting /media/camera/streams/cam4/cam4_XXX.mp4 (797.06 MiB)

In this example, in round 1, both cam1 and cam4 are over their weighted share of the total 3072 GiB target. cam2 and cam3 are under so they are removed, and the total target of the next round’s considered collections is reduced to 2178.57 GiB. In round 2, cam1 and cam4 are re-evaluated according to this target, using their own weights compared to each other (not all collections). It’s actually possible a collection can be over target in one round but under the next, triggering a round 3 (cam1 used to do this until a few days ago). With the final weights determined, files from each collection are deleted until they come in under their collection target.

Right now, the initial round is massively skewed since the this system is new, with cam4 largely above target and cam2 is largely below target. But over time, this will even out to their respective weights. The nice part of this system is it’s not wasteful if a collection is below target; as I mentioned, cam3 is only used on-demand and will likely always be below its target of 133.57 GiB. But because of that, it’ll always go on to a new round which will utilize the extra space, and the global target will always be 3072 GiB utilization.

The full Python code is available here. I’m pretty sure I’ve got all the corner cases worked out, but obviously please use caution if you use this. (I’ve commented out the line used to actually do the delete.)

CMYK printer line test sheet

CMYK printer line test screenshot This week I bought a new color laser printer, a Brother HL-3170CDW. It’s replacing my old laser printer (Brother HL-2270DW) and inkjet (Canon MX922), the latter I got sick of being, well, an inkjet printer.

I use to have a weekly test print sent to the Canon, to prevent the nozzles from seizing and to notice when the ink cartridges were empty. This was a cropped test print I found online, and mostly did the job. But with the new printer, I decided to design my own test page. It’s not comprehensive; there are better tests out there for when you suspect something is wrong, for example it doesn’t test gradients. The one I designed is specifically for a weekly test, and will point out ink/toner problems while not using much ink/toner.

For each tested color (cyan, magenta, yellow, black, magenta/yellow, cyan/yellow and cyan/magenta), it shows a series of 15mm lines at 2.0, 1.0, 0.75, 0.5 and 0.25 points. The lines are shown horizontal, vertical and at 45 degree angles. Each tested color’s total surface area is 285.84mm² (which would be about a 16.9mm square, so roughly equal to one of the colors’ individual squares). All ancillary text is grey to make the pure tested black stand out.

Download:

I have my Ubuntu laptop run the following every Sunday morning:

lpr -P brotherc /path/to/CMYK_Line_Test_US_Letter.pdf

(Replace “brotherc” with your printer identifier.)

« All posts