Introduction

Repairability has always been a consideration, and since the dawn of technology, there has always been a dispute between those who prioritize repairability over features, and those with the opposite priority. Dual turntables sounded sweet when new or set up properly, but with their myriads of gears and levers they were holy hell to repair. BIC turntables didn't sound quite as good, but they were trivially simple machines easily kept within specification, so they often sounded better than their (misadjusted or defective) Dual counterparts.

Electric door locks and windows are nice on cars, but often fail after 60K miles. Today's linear-pull bicycle brakes stop spectacularly when perfectly adjusted, but either lose function or create continuous riding friction when things come two millimeters out of adjustment, or when a little friction comes into the picture. Microsoft Windows has enough features that you can use it do amazing things with it, but if you need to fix it, that fix is often a series of chants and invocations, rather than a systematic narrowing of the root cause scope.

I had a buddy who rebuilt the engine of his 1960's era VW Bug, in a parking space outside our apartment, with just normal tools and a jack. Old VW Bugs were built for repairability: One reason why a bunch of them are still on the road today. A reason why so many of them were converted to dune buggies. Don't try street engine rebuilds or dune buggy conversions with most cars: They weren't designed to be repairable.

And now comes systemd, tying together PID1, Pam, the former ConsoleKit, udev, several GUI user interfaces, and who knows what else. How do you replace a part, either for testing, diagnosis, or enhancement, when the whole thing's welded together? Systemd consists of a lot of C code. How do you look inside all that stuff? With a debugger? Is that a reasonable expectation for admins?

And last but not least, now that we've paid with our repairability, what do we gain? Cgroups and never losing track of a process again? That could have been done, and in fact has been done, without all the entanglement. Parallelized service instantiation? That could have been, and in fact has been done without the entanglement. Fast boot? First, I've not seen boot time comparisons between systemd, and the simpler init programs like s6, runit, and nosh. Second...

Second, if you want a ten second boot to CLI with sysvinit, just use a $125 256GB SSD drive as /, and one or more Western Digital Black drives for high quality /home, /var, and other fast-changing directories. My Debian Wheezy box is set up like that, and it's 10 seconds from the end of POST to the CLI command prompt. And if you're using RAID, you can make one RAID array for the root and another for the rest. Details depend on whether hardware or software RAID.

An added benefit of the /usr on an SSD / is that you avoid complications and initramfs considerations being brought by the planned (by some) bin and sbin merges.

Interaction Promiscuity and Repairability

The following two graphics represent, extremes on the spectrum of (respectively) simple interactions to promiscuous interactions:

Both preceding graphics have 16 small circles, but from a repairability standpoint they couldn't more different. The first, which of course is an extreme that isn't seen in practice, has few interactions, has bigger modules split into smaller modules, with only one input and output between the bigger modules. To find the root cause, you measure (or strongarm) an interaction to see which side of that interaction the root cause resides on, and then you likewise split the the smaller area. Soon the root cause is obvious.

In the second graphic, which of course is an extreme that isn't seen in practice, every component interacts with every other. If the number of components is N, the number of interactions is (N² + N)/2. It increases as the square of N. Ugh! The practical affect of all these interactions is feedback loops all over the place, and to troubleshoot a feedback loop you need to break or disable it. At a certain level of interaction promiscuity, the effect is that the entire thing becomes, for the purposes of repair, a monolithic black box.

So the question is, does systemd more resemble the simple, or the promiscous interaction model? Here's what I know: I've seen a few block diagrams of systemd, but not one of them even attempts to show interaction lines: They're just boxes inside of boxes. This strongly suggests to me that systemd is much closer to the promiscuous model.

Feedback Loops

The big problem with promiscuous interactions is feedback loops. A feedback loop is where a modules output affects its input, through at least one other module. Feedback loops usually need to be cut or strongarmed in order to troubleshoot a module. Take this block diagram of an analog amplifier as a rather simple example:

In a discrete transistor amplifier, the idiomatic troubleshooting tactic is disconnecting the feedback loop. The other way is to strongarm interactions and give a lot of thought to what the results imply.

Feedback loops involving two modules are reasonable to handle. But what about when they involve a circle of five or six?

The preceding is difficult to troubleshoot, but it can be done. Now put some promiscuous interactions into the mix:

I'm not saying it can't be diagnosed, but it's an order of magnitude harder than a system with few feedback loops. No diagnostic test yields a simple answer: Every diagnostic result must be deeply thought out. Nor is it easy to swap out one module and replace it with a diagnostic dummy: Its interface is just too complex.

Dependencies

When you repair your car, it slows you down to have to remove and later reinstall a bunch of smog equipment to get to the spark plugs. When you repair your computer, it slows you down to have to install and remove packages, and later do the reverse, to get to the module you need.

Dependencies are better than rewriting the exact same code, over and over again. But when you pull in a trainload of code, with child and grandchild dependencies, to do one minor part of the trainload, many of us believe that's a bad idea. Sometimes a little code redundancy is the better choice.

Shellscripts vs C

A lot of what sysvinit, daemontools, and many other PID1 and daemonizer tools do with shellscripts, systemd does with config files processed by C routines. Systemd advocates correctly point out it's shellscripts are harder to make and maintain, and it's easier to screw things up if you get your shellscript wrong. What I think more than counteracts those facts are these three facts:

Shellscripts provide ideal troubleshooting test points
Shellscripts can accommodate absolutely any use case
You need a debugger to troubleshoot within a C-derived binary executable, but only an editor to troubleshoot within a shellscript. Any admin can put diagnostic prints or file writes within a shellscript.

These facts are the reason I say that config shellscripts are more repairable than C code reading config files.

Code Size

All other things being equal, the bigger the code, the harder to troubleshoot. All other things being equal, the bigger the code, the more dark places for bugs to hide. Systemd's code size is huge, or maybe even bigger.

But be careful. As systemd advocates point out, if you count the init shellscripts, sysvinit's line count balloons. However, be careful also, because init scripts can be altered or replaced in-toto for troubleshooting purposes, and many init programs use scripts about the size of a systemd unit file.

Use Case Independence

Use case independence means being able to use the product (in this case Linux) in ways the product's creators didn't foresee. One great way to do this is via scripting. Sysvinit, daemontools, nosh, and many others can use scripts for the scaffolding that launches the daemon. In myth 4 of his The Biggest Myths blog, Lennart Poettering apears to say that you can use systemd to launch processes with scripting in them, but it sounds like he's saying systemd can't use scripts for the scaffolding, the way the others can. Having the scaffolding scripted makes troubleshooting easier via diagnostic measuring, logging, strongarming, and replacing (test jog, unit test, etc).

Use case independence also implies the ability to replace a component with another of your own choosing or making, without having to replace large numbers of other components. Here, the other init systems fare better than systemd, which appears to lock up a great many functionalities, or at least dependencies on such functionalities, into PID1.

Last but not least, use case independence implies reasonably granular packaging, to avoid dependency hell. Here again, the other PID1 programs have the advantage.

Wrapup

There are lots of "block diagrams" of the systemd system on the Internet, but none that I ever saw included the interaction lines between the blocks. Until proven otherwise, I think the likely reason for lack of such diagrams is that systemd interactions are numerous and complex. Otherwise, why has nobody supplied a block diagram with both boxes and interaction lines? If somebody did, and the diagram were halfway reasonable, it would surely silence a lot of argument.

Repairability is an important priority to some people. So is ability to interchange a more suitable component for the one that came with the system. Via interaction promiscuity, feedback loops, substitution of compiled C for shellscripts, and (compiled C) code size, systemd's complexity makes Linux less repairable.

[ Training | Troubleshooters.Com | Email Steve Litt ]