Troubleshooters.Com, Linux Library and Init System Choices present:

User Specific Runit Supervisor

See the Troubleshooters.Com Bookstore.

Contents:

Introduction

Runit is a great init system, and also when it's connected to a different init system's PID1 it becomes a great multi-daemon supervisor. Using runit as an init system or a system-wide multi-daemon supervisor is not a topic for this document and won't be discussed further.

Another great use of runit is to run daemons for a specific user, in my case user "slitt". All examples in this document are for user slitt, so when you use this document's info to build your own multi-daemon supervisor for a specific user, be sure to change all instances of slitt to whatever user name you want to run its own daemons.

All shellscripts in this document are run with the ksh shell. You can choose a different shell, like dash, bash, /bin/sh, or anything reasonably resembling ksh, dash, and bash, but personally I never use bash in shellscripts (bash is huge and remember Shellshock?). In my opinion bash is a great interactive shell and a poor scripting shell. For shellscripts, I love ksh because it works beautifully.

Where specific-user runit excels is in the GUI environment, so this document concentrates on running runit from the ~/.xinitrc shellscript. ~/.xinitrc is the script that is run when you issue the startx command. If, instead of using startx, you boot right into your window manager (and perhaps desktop environment), then you'll need to ask for help from a mailing list or forum or whatever for your specific distribution.

In this document, actually binding the runit instance to .xinitrc is delayed until the end.

About Naming In This Document

Throughout this document, the username is represented by the string "slitt", because that's less ambiguous than things like "<user>" or even "my_name". Obviously you'll change all occurrences of "slitt" to the username of the user using this specific-user runit.

This document uses some odd, and perhaps inconsistent filenames and directory names. This is because this document is an as-built description of my user-specific runit for user slitt. I use this personal runit all the time now. Some directories, such as /scratch, are unique to my situation. Others, such as /home/slitt/mytest.sh came about because I experimented fast and then wrote the document to match. As it is, this document took a month, off and on, to write. Going back and adjusting everything for the best academic documentation would have taken more time than I could devote.

In summary, the username, directory names and filenames in this document are an as-built description. I did it this way so I wouldn't need to create and tech-edit twice.

Suggestions For DIY Learning Purposes

If you object to /scratch, or if you're not allowed to create /scratch, then I suggest you substitute /home/slitt/scratch (but use your own username). Other than that, I suggest that you just go along with the filenames and directory names in this document, for DIY learning purposes. This way you can go through it fast, with a minimum of transcription errors.

Once you have a good grasp of specific user runit, I suggest that for your real, production system, you substitute the file and directory names best suited to your situation.

Basic Theory of the Runit Multi-Daemon Supervisor

Runit can be an entire init system (including PID 1), but that capability is beyond the scope of this document. This document concerns itself exclusively with the multi-daemon supervisor part of the runit init system, and further narrows its scope to the use of this multi-daemon supervisor by and for a specific user. Throughout this document, this user is called "slitt", but unless your user is also called "slitt", you'll replace every instance of "slitt" with the username of the user for whom you're setting up your user specific runit multi-daemon supervisor. Now it's time for a little vocabulary...

Vocabulary

Tip:

Use your browser's "duplicate tab" feature, typically accessed by right clicking the current tab, to put a copy of this tab on another. By doing this, you can have one tab always open to the Vocabulary subsection to understand words and phrases as they come up.

Runit Startup Block Diagram

The following diagram is a Mental Model (block diagram) of the startup of a runit daemon supervisor:

runit Startup Mental Model

Note:

The diagram calls the individual daemon directory subdir, so this narrative does too.

Narrative of the Startup Block Diagram

Runit's daemon supervisor is a program called runsvdir. For the purpose of user specific runit setups, you don't need to worry about PID1, so you'll deal only with runit's multi-daemon supervisor, whose program name is runsvdir. runsvdir cycles around a specific directory (the service directory), and for each specially made daemon directory (like newtest) contained within the service directory, runsvdir runs a single daemon supervisor called runsv, which in turn runs the daemon. In this document the service directory is /home/slitt/runsvdir_gui.

runsvdir continually loops through every daemon directory (such as newtest) within the service directory. For each daemon directory, runsvdir forks an instance of runsv, which supervises a single daemon. Here's how runsv does it:

  1. runsv forks an instance of the daemon directory's run script.
  2. The run script ends with exec mydaemon arg1. There can be as many arguments as is necessary for mydaemon, which of course is just a contrived name: Your daemon program's name will be different.
  3. The process name of the run script becomes mydaemon (in this example) because of the exec on the call.

So if the daemon directory name is subdir and the daemon's filename is mydaemon, the process tree look like the following:

runsvdir /home/slitt/service_gui ..........
     |
     |_ runsv subdir
           |
           |_ mydaemon arg1

In reality, your runsvdir command will probably have 300 to 400 dots instead of the 10 shown in the preceding tree, but for clarity's sake I only included 10 dots in the tree.

Because the fork() command returns the new child's PID to the parent, the runsv instance knows the PID of the daemon it forked. And for the same reason, runsvdir knows the PID of every runsv command it forks. So there's no need for PID files.

Block Diagram For Controlling A Runit Daemon

The following diagram is a Mental Model (block diagram) of the startup of a runit daemon supervisor:

runit Control Mental Model

Narrative of the Control Diagram

Let's describe this from the daemon's runsv outward. The daemon directory (called subdir in the diagram and therefore in the narrative) contains a directory called supervise (how it got there is explained later), which contains other files and directories including control, status and stat. stat always has a text representation of part of the daemon's status and is useful only for humans, not for the rest of the software.. status has a binary representation of all status information, including current PID (or previous PID id currently down), the uptime of the daemon's single daemon supervisor (runsv). It also has whether the daemon is running, typically "run" for running and operating normally, or "down", meaning there is no daemon process currently running. It can take on other values besides "down" and "up".

About the PID: runsv always knows the PID of the running or stopped daemon because the PID is stored in status. Therefore runsv can send the daemon signals by means of its PID. The signals runsv sends to the daemon depend on the the letter received by supervise/control. Please be aware that a single letter can cause runsv to send multiple signals to the daemon.

It's also important to understand that runsv reads each character as it comes in, not waiting for a newline. If the letter is something it understands, it sends the appropriate signals to the daemon and then removes the letter from the pipe. If it doesn't recognize the letter, it ignores the letter and removes the letter from the pipe.

runsv recognizes many different letters, although only "u" and "d" are discussed in this document. You can see all the letters, and the signals runsvdir sends to the daemon as a result, on the runsv man page.

How the supervise directory is created

The supervise directory is created, with its files such control, status, lock, ok, stat and pid, and the first time both of the following are true:

Within the supervise directory are the following files and named pipes:

The interface described in this subsection is great and simple, but wouldn't it be nice to have a front end to it? Runit gives you such a front end: It's called sv. sv up subdir writes a "u" to supervise/control. sv down subdir sends a "d". sv restart subdir sends a "du" string. sv status subdir reads the contents of supervise/status, and the uptime or downtime for the daemon, and prints whether it's running or down, the daemon directory, the daemon's PID if available, how many seconds, and sometimes other things, for instance, whether if it's paused, and sends a report to stdout.

sv For a Specific User

The preceding description of sv is a front end to the main OS instance of runsvdir, not the one run for the specific user. To run sv for specific user slitt, I made the following shellscript, called svl, located on the executable path given to user slitt:

#!/usr/bin/env ksh
svpath=/home/slitt/service_gui
sv $1  $svpath/$

Who Needs a User Specific Runit

Joe average doesn't need a user specific runit (actually a user specific multi-daemon supervisor). The following are some factors that tend to indicate a need for you to have a user specific runit:

Making Your User Specific Runit: An Overview

The next several sections walk you through setting up your own user specific runit. The following is a list of several sections after this that are steps in the process. Please continue reading this section until its end before going on to the steps.

  1. Make the svl Shellscript
  2. Make the Service Directory
  3. Make the "All Daemons" Directory
  4. Make the Runsvdir Runner Program
  5. Make the Daemon Program
  6. Make the Daemon's Directory Inside The "All Daemons" Directory
  7. Make the Daemon's Run Script Inside the Daemon's Directory
  8. VITAL! Make the Daemon's Run Script Executable
  9. Symlink the Daemon's Directory From Inside the Service Directory
  10. As the Normal User, Start the Runsvdir Runner Program
  11. Test and Troubleshoot

The first three steps are performed only when first setting up your user specific runit. Once you've set up your .Xinitrc, or whatever program starts your GUI system in such a way as to fork and exec runsvdir, the second to last step happens automatically when you start X11 or Wayland. The final step is done every time you add or modify a daemon.

The next few sections of this document detail each of the steps, for an incredibly simple proof of concept daemon that does nothing but write a second-by-second log file.

Remember:

Be sure to change all occurrences of "slitt" to whatever username you're using!

Make the svl Shellscript

Create the following shellscript, called svl, inside a directory on the executable path for the specific user (slitt in this example). /home/slitt/bin would be an excellent place to put it. If there's already an executable program or script called svl, just change the name. All that's necessary is that you be able to remember the shellscript's filename, and that it be short, because you'll be using it a lot.

#!/usr/bin/env ksh
svpath=/home/slitt/service_gui
sv $1  $svpath/$2
[slitt@mydesk ~]$ 

Now make it executable:

[slitt@mydesk ~]$ chmod a+x /home/slitt/bin/svl
[slitt@mydesk ~]$

Now you can use svl as an equivalent for runit's sv command, specifically for the runit belonging to user slitt.

Make the Service Directory

mkdir /home/slitt/service_gui

This is the directory that runsvdir cycles through. This is the directory containing a symbolic link to each daemon directory you want supervised.

You could theoretically put regular directories here instead of symbolic links. Don't do that. Using symbolic links makes it very easy to completely disconnect a daemon from runit, and later connect it back again. This makes runit troubleshooting a lot less difficult than it would otherwise be.

Make the "All Daemons" Directory

mkdir /home/slitt/service_all

This directory contains all the daemon directories useful to the specific user (slitt in this document), whether or not the daemon is currently connected to service directory runsvdir. You connect a daemon directory to runsvdir by making a symlink to it within the service directory /home/slitt/service_gui in this document.

Think of it this way: The all daemons directory, /home/slitt/service_all, contains real subdirectories in the form of daemon directories. The service directory, /home/slitt/service_gui, contains symlinks to the daemon directories in /home/slitt/service_all.

Ugh!

Some runit setups have symlinks to symlinks to symlinks. Part of this is to implement "runlevels", which I view as an antique from the 1990's now rendered useless. Multiply cascaded symlinks invariably lead to trouble unless in the hands of a 160 IQ green eyeshade accountant. I've purposely kept this document to a single level of symlinks so that mere mortals can use the material without getting into trouble.

Make the Runsvdir Runner Program

I call the following program /d/bats/runsvdir/runsvdir_slitt.sh. You can call it anything you want, and you can put it in any directory you want, although I suggest you don't put it on the executable path, because typically it should be run only when you first run your window manager/desktop environment.

Notice the 400 dots at the end. These dots make space for 400 characters of error logging when some daemons throw errors.

#!/bin/ksh
exec runsvdir -P /home/slitt/service_gui Log: ................................................................................................................................................................................................................................................................................................................................................................................................................

Later, when everything's set up, you can run this program manually with the following command:

[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh
[slitt@mydesk ~]$

Also, make sure the preceding works in the background:

[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh &
[slitt@mydesk ~]$

Make the Daemon Program

#!/bin/ksh
while /bin/true; do
  date +%H:%M:%S >> /tmp/junky.log
  sleep 1
done

Make the preceding executable with the chmod command. In my test I called the preceding program /home/slitt/mytest.sh, which is not on the executable path. You can call yours any name you want and put it anywhere owned by the specific user (slitt in this document).

When the preceding program is running, every second it appends a timestamp to file /tmp/junky.log. To test the preceding program, run two terminal emulators on your screen: One on the left and one on the right. In the right one, run /home/slitt/mytest.sh. In the left one run the following command:

tail -fn0 /tmp/junky.log

If all is good, you'll see a new timestamp every second in the left terminal. Otherwise, troubleshoot. The daemon program, which I call /home/slitt/mytest.sh, must be functional before continuing.

Once everything's running properly, Ctrl+c out of the processes running in both terminals, to prevent weird things from happening in later steps.

Make the Daemon's Directory Inside The "All Daemons" Directory

[slitt@mydesk ~]$ mkdir /home/slitt/service_all/newtest
[slitt@mydesk ~]$

In this exercise example I'm calling the daemon directory newtest, which will later be symlinked into service directory as /home/slitt/service_gui/newtest

Make the Daemon's Run Script Inside the Daemon's Directory

Put the following shellscript, called run, inside directory /home/slitt/service_all/newtest

#!/bin/ksh
exec /home/slitt/mytest.sh

The preceding simply exec's /home/slitt/mytest.sh, which means that the run script runs /home/slitt/mytest.sh under the run script's own PID. The result is that the supervise/pid that held the PID of the run script now holds the same PID, but it now executes /home/slitt/service_gui/newtest/run, and is revealed by the ps command to be named /home/slitt/service_gui/newtest/run, even though it has the same PID as the run script had before the exec command at the bottom of the run script.

But none of this happens until file /home/slitt/service_all/newtest is symlinked inside the service directory, /home/slitt/service_gui/newtest

VITAL! Make the Daemon's Run Script Executable

DANGER!

Be absolutely careful to perform this step. If you fail to make the daemon's run script executable, later you'll be forced to do some intricate, frustrating and lengthy debugging.

[slitt@mydesk ~]$ chmod a+x /home/slitt/service_all/newtest/run
[slitt@mydesk ~]$

Symlink the Daemon's Directory From Inside the Service Directory

[slitt@mydesk ~]$ ln /home/slitt/service_all/newtest /home/slitt/service_gui/newtest
[slitt@mydesk ~]$

The preceding command enables the daemon to be run by the multi-daemon supervisor, runsvdir. But of course that doesn't happen until the multi-daemon supervisor is running. Read on...

As the Normal User, Start the Runsvdir Runner Program

[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh &
[slitt@mydesk ~]$

The preceding command calls /d/bats/runsvdir/runsvdir_slitt.sh, which runs runit's runsvdir command on the service directory, /home/slitt/service_gui, thereby getting the whole runit supervision system to work.

Test

If you've done everything absolutely correctly, the system should work absolutely correctly. This usually doesn't happen in real life, so you need to test. First, test and see if everything's perfect:

  1. In a terminal emulator on the left side of your screen, issue command tail -fn0 /tmp/junky.log. If it prints a new timestamp every second, things are looking up.
  2. In a terminal emulator on the right side of your desk, issue command svl down newtest. If the terminal emulator on the left side stops printing new timestamps, you can be almost certain that your user specific runit is working correctly with your simple daemon, whose directory is /home/slitt/runsvdir_gui/newtest, is running correctly. If (and more likely when unless you're a very careful person) it doesn't work, continue on to the Troubleshooting section of this document, and every time you think it's working, test with this section.

Troubleshoot

Note:

This section uses "slitt" for the username, newtest for the daemon directory name, /home/slitt/mytest.sh is the actual daemon, /home/slitt/service_gui is the service directory, and /home/slitt/service_gui is the "all daemons" directory.

Some of these are obviously not optimal, especially an executable contained in /home/slitt, but they work for the purpose of demonstration.

The runit multi-daemon supervisor system isn't easy to troubleshoot, but you can do it.

Read-Only Diagnostic Tests

First, before performing any diagnostic tests that change anything, get all the info you can. Start with the following command:

[slitt@mydesk supervise]$ svl status newtest
down: /home/slitt/service_gui/newtest: 11s, normally up
[slitt@mydesk supervise]$

The preceding output tells you the following, in order from left to right:

Next, look at the output for runsvdir with the following command:

ps ax | grep runsvdir  | grep service_gui

Remember those 400 dots in arg2 to runsvdir? Those are called the "log" because any runsvdir error messages take the place of some of the dots. Theoretically, the above mentioned ps command should show nothing but dots. If there are error messages in this command, copy them into a file so you can later use them to troubleshoot.

If there's a log file for this daemon, (not for runsvdir, but for the daemon appearing to cause the error message written over the dots), look at it. The following is also a helpful info source, run as user root:

tail -fn40 /var/log/messages

Look over the listings the preceding command gives you, trying to find any that might be relevant to your runit problem and be observant of new lines that appear.

A special problem that occurs quite frequently is that your svl status command usually says "up" for the state, the seconds up is 0, 1, or a very small number of seconds, and the PID keeps changing. This is an indication that your daemon either terminates quickly or is not being run at all. This is powerful information when you start doing diagnostic tests with which you manipulate the daemon and its directory and runsvdir. This condition usually means that the daemon's runsv can't fork the daemon directory's run script, or that the run script is terminating quickly. Make a note of how often it seems to reset, because it will come in handy later. Your job then becomes to discover which of these two possibilities is the root cause, and then narrow down to the root cause.

There's a certain class of problems that point an accusing finger at the supervise tree inside the daemon directory. The following error messages point this accusing finger:

If you get the "unable to start ./run: access denied" within the dots, your ./run script and/or the daemon it exec's probably is not chmoded executable for the specific user (slitt in this document). Chmod both to executable for the user (or for everybody if you like doing things that way) and try again. svl restart . If you're lucky, things start to work. However, the typical result of non-executability is that even after it's corrected, there's been consequential damage or misalignment to something in the supervise tree, causing a problem with state, so you need to bring out the big guns, brute force troubleshooting.

Brute Force Troubleshooting

Before doing brute force troubleshooting, perform the non-destructive tests described in the preceding subsection. Then attempt to fix any root causes the non-destructive tests make you suspect.

Runit supervisor problems often involve a problem with state, so troubleshooting can get brutal. Because runit daemon supervisors are so simple, brute force troubleshooting is often the quickest and easiest route to a solution.

I call the following outline based recipe brute force troubleshooting:

  1. cd /home/slitt/service_gui/newtest
  2. sv down . to stop the newtest daemon. Please notice the use of sv rather than svl. You don't need and shouldn't prepend a path, to the daemon directory, because you're already there.
  3. cd .. to go up one directory.
  4. rm newtest to delete the symlink. Now you've completely disconnected the daemon run by directory memtest from the supervisor, runsvdir.
  5. cd /home/slitt/service_all/newtest to get into the non-symlink copy of the daemon directory.
  6. ./run. Either it works correctly or it doesn't.
  7. Make a change, which serves as a diagnostic test.
  8. ln -s /home/slitt/service_all/newtest /home/slitt/service_gui/newtest in order to clean-start the newtest daemon.
  9. If there are still problems, go back to step 1. Otherwise, you're done troubleshooting this daemon.

Supervising Pulseaudio With Your Specific User Runit

Pulseaudio is sound software enabling several sound sources to access your sound system independently, at the same time. You can often accomplish the same thing just with ALSA, an older and in my opinion more robust sound software. But for certain things you need Pulseaudio. On some Linux distributions Pulseaudio works pretty much right out of the box. On other distributions it can turn into a holy mess. It doesn't help that there's a lot of contradictory advice out there telling you how to set up and use Pulseaudio. This section gives you a distribution agnostic, "just works" method for controlling Pulseaudio using a user specific runit.

Note:

Pulseaudio must not be run as root.

Everythings the same as in the newtest daemon directory detailed earlier. The only exception is the contents of the run file. I call my daemon directory for Pulseaudio pulse_slitt. The run script inside pulse_slitt contains the following:

#!/usr/bin/env ksh
pkill pavucontrol
pavucontrol &
exec /usr/bin/pulseaudio --daemonize=no --exit-idle-time=-1

The preceding run script has the following line by line explanation:

  1. #!/usr/bin/env ksh: The script's shebang line, telling it to evaluate its contents in the ksh shell.
  2. pkill pavucontrol: Kill all instances of pavucontrol, the mixer/level program for Pulseaudio. For some reason, every time you stop and restart Pulseaudio, additional redundant and confusing sliders get added.
  3. pavucontrol &: Run a new copy of pavucontrol in the background. Because it's a GUI program, even though its process is in the background and its text output is invisible, its GUI interface is visible and ready for manipulation.
  4. exec /usr/bin/pulseaudio --daemonize=no --exit-idle-time=-1: exec pulseaudio in the foreground (--daemonize=no) and stay running through idle periods (--exit-idle-time=-1)

If you're having trouble with Pulseaudio, consider putting it a user specific runit, even if it's your only reason for a user specific runit. With your Pulseaudio supervised by your personal runit, everything pretty much falls into place.

Note:

You probably need to add yourself to the "audio" group in order to have Pulseaudio work correctly. Also, you'll need to set autospawn = no in either the user Pulseaudio client configuration file or the system-wide Pulseaudio client configuration file. You can find details on these files, including their names and locations, from Google's AI Overview.

What I've Moved To My Personal Runit

I have migrated the following to my personal runit:

All four perform much better now that they're supervised by my personal runit. In fact, moving them to runit fixed some longstanding bugs in the pager loop and the reminder loop. I think you'll find a similar improvement

My email pager loop

This was probably the easiest to move over, because I wrote it only a year or so ago, and I was willing to live with some compromises in order to get it into my personal runit. My pager daemon spins around listening for email to a certain email address. When it receives a *qualifying* email, it rings some bells, throws up a very noticeable window on my screen, and of course puts the email in the proper mailbox so I can read it. The purpose is so if I'm giving a presentation or class on Jitsi and my Jitsi connection gets disconnected, people can email that address and tell me the connection is lost. When you give online presentations and classes, this is a must!

My pager is a very over-engineered Python program that moves heaven and earth to go down only after all pager emails have been acted on. Considering how rarely this program is brought down, this might be overkill. So for now I just made the following pager run script:

#!/usr/bin/env ksh

cd /d/pager
mkdir -p /tmp/pager/completed
rm reload.int
rm /tmp/pager/pager.pid
exec ./loops.py

In the preceding, the two rm commands remove files that would have operated on the python program but are no longer necessary or even correct. A heck of a lot of loops.py addresses looping over and over again through the newly received pager emails. Once managed by my personal runit, life is much better if the loop part is handled by a shellscript that calls the Python program to go through just once and then sleep, but for the time being I'll leave it the way it is because it works.

Once the loop is run by a shellscript calling a one-pass Python program, I can accomplish the ideal of going down only at the beginning or end of the sleep using a USR1 signal that sets a variable that's tested at both ends of the sleep, and if the variable is True, send a HUP to the Python program.

My fetchmail loop

My fetchmail shellscript was given me by a mentor over 20 years ago. It used fetchmail in daemon mode. I guess it made sense then, but it sure doesn't now. So when I moved fetchmail over to my personal runit, I made my daemon just a looping shellscript that calls fetchmail to do one check, and then sleeps for X number of seconds. By having each call to fetchmail do one pass and putting the loop in a shellscript supervised by my personal runit, I got rid of a huge, nasty shellscript I'd been using for over 20 years. Not only that, but I'll later be able to make sure it goes down only just before or just after the sleep command, rather than while pulling emails from an IMAP server and maybe losing some email. The following is my run script for my fetchmail:

#!/usr/bin/env ksh
exec /d/at/fetchmail/fetchmail_loop
[slitt@mydesk fetchmail]$

And the following is /d/at/fetchmail/fetchmail_loop

#!/usr/bin/env ksh

waittime=180

logfile=/d/at/fetchmail/logs/fetchmail.log
while /bin/true; do
   date +%Y%m%d_%H%M%S >> "$logfile"
   fetchmail -f /wouldnt/you/like/to/know/.fetchmailrc 1>&2 >> "$logfile"
   echo "" >> "$logfile"
   echo "" >> "$logfile"
   sleep "$waittime"
done

As I said, very soon I'll make fetchmail_loop honor signal USR1 in order to wait until a complete pass before downing the daemon, so there's no chance of losing email. Pretty cool, right? It all begins when you create your own personal runit.

Why Not Use the Existing Fetchmail Daemon?

In a response to this document, somebody brought up the very real fact that fetchmail is designed from the bottom up to be a daemon, and a very good and complete daemon, so why do I need a looping shellscript to control a single shot fetchmail?

As an answer, consider the following points:

  1. Fetchmail itself cannot loop except in daemon (background) mode.
  2. Runit cannot reliably supervise a process running in the background.

Bottom line: From runit you can only start or supervise a one-shot fetchmail run, so if you want to supervise fetchmail from runit you'll need to write a looping shellscript. So the obvious question is, why use runit to supervise fetchmail? I can think of these reasons:

  1. Without runit, fetchmail won't restart if it crashes or is terminated.
  2. With runit, you have a more familiar up/down/restart/status interface than you would using the fetchmail daemon's unique interface.

Fetchmail is a spectacular program. I've used it almost every minute of every day for 12 years. Fetchmail has very few flaws, but one flaw it does have is to make the background/foreground flag also control whether you want fetchmail to loop or not. If these two factors were controlled separately, then I'd be more likely to use fetchmail's loop instead of calling it, one-shot, from a looping shellscript.

My calendar/reminder loop

My Reminder program pops up a GUI message every few hours reminding me of upcoming appointments. This is the way I work best: I'm not a calendar kind of guy. I wrote the Reminder program some time in the 00's, in Python. It's an albatross I'm scared to work on. It's reminders.py, with the task to be performed as arg1. The following is a list of those tasks:

The loop task has argument loop_display. I had been running it from my UMENU software, where it had at least two long standing bugs:

  1. Each time it forks off a GUI reminder window, after closing that forked off window, that forked off window becomes a zombie.
  2. It often forks off two or three reminder windows for the same clock time (9AM, for instance).

Now that my personal runit is supervising it, so far it appears that the zombies are gone, and the multiple reminder windows happen much less frequently, and so far a maximum of one duplicate, not two duplicates like when it was run from UMENU. The following is the run script for the reminder loop:

#!/usr/bin/env ksh
cd /d/at/python/reminders
exec ./reminders.py loop_display

My pulseaudio loop

This was already covered in the Supervising Pulseaudio With Your Specific User Runit. The only other thing I can say is this: For years I've tried unsuccessfully to get Pulseaudio to work on Void Linux. Once I made my daemon and supervised it with my personal runit, Pulseaudio worked consistently, productively and unsurprisingly.

My Personal Runit Has Made Things Easier

So it appears that supervising my daemons from my personal runit sometimes solves problems and sometimes makes things less complex. What I've learned is that the next time I want to make a daemon, I'll put the looping part in a simple shellscript and do the single pass part in Python or C or whatever. Now that I have my personal runit, life is going to get much simpler.

Proof of Concept: Shutting Down With USR1

As mentioned in an earlier section, when your daemon is a looping shellscript calling a 1 pass process to iterate once each cycle, it's sometimes important to make sure the daemon doesn't go down during the process, but only before or after it. For me, this is especially true of fetchmail, because I don't want to take any chance of losing email.

Note:

My experimentation tells me that simply using sv down allows the separate program that does the 1 pass process to finish. Probably somewhere the documentation says this too. But with fetchmail I want to be absolutely certain. Hence the USR1 solution, even if theoretically it might not be needed.

When designing this method, the last thing I want to do is fiddle around with my running postgres loop code to find how to do it. So instead, I fiddled around with my newtest daemon directory and the shellscript it runs. The following Ascii diagram describes the new process, starting from newtest's run script:

run ==>mytest.sh ==>count2ten.sh

The run script doesn't change a bit: It still runs mytest.sh. Both mytest.sh and count2ten.sh are in directory /home/slitt. count2ten.sh writes a start message to the log (/tmp/junky.log, then for "one" to "ten" logs the number, one per second, and finally logs a finish message. Under normal operation mytest.sh simply loops forever, with each iteration running count2ten.sh and then sleeping 10 seconds.

When mytest.sh receives a USR1 signal, it sets a variable from "0" to "1". After the return of count2ten.sh and before the sleep, a function is called that checks that variable, and if that variable is "1", then it performs the following command:

/d/bats/svl down newtest

The following is the run script:

#!/usr/bin/env ksh

stop_when_safe=/bin/false

on_usr1() {
	stop_when_safe=/bin/true
}

test_and_handle_safe_stop() {
	if $stop_when_safe; then
		echo "Stopping now!" >> /tmp/junky.log
		echo "" >> /tmp/junky.log
		echo "" >> /tmp/junky.log
		/d/bats/svl down newtest
	fi
}

trap on_usr1 USR1
cd /home/slitt || exit 1
while /bin/true; do
	echo -n "PID is " >> /tmp/junky.log
	echo $$ >> /tmp/junky.log
	./count2ten.sh
	sleep 1
	test_and_handle_safe_stop
	sleep 10
	test_and_handle_safe_stop
done

The trap on_usr1 USR1 line runs function on_usr1(), and does nothing else, when the program receives a USR1 signal. Note that receiving a USR1 signal is asynchronous with respect to the loop. Function on_usr1() changes variable $stop_when_safe from a false status to a true status, and nothing more. Function test_and_handle_safe_stop() checks variable $stop_when_safe, and if true, uses the sv down newtest command to stop the daemon itself.

What this code has done is turned the asynchronous receipt of USR1 to a synchronous shutdown that can occur only 1 second after ./count2ten.sh finishes, or immediately before it starts, thus preventing a stop while actual work is being done.

Note:

This also could have been done by creating a sentinel file. In that case, test_and_handle_safe_stop() would be used to test for that file's existence, and if it exists, delete it and down the daemon. This would have eliminated the trap statement and the on_usr1() function, but in my opinion using the signal is more Unixy and probably a better idea.

mytest.sh cals count2ten.sh, which serves as a simple standin for a program that really does something (like a single-run, foreground invocation of fetchmail). The following is the straightforward code for count2ten.sh:

#!/usr/bin/env ksh
echo "" >> /tmp/junky.log
echo "Begin count2ten.sh" >> /tmp/junky.log
for num in one two three four five six seven eight nine ten; do
	echo $num >> /tmp/junky.log
	sleep 1
done
echo "End count2ten.sh" >> /tmp/junky.log
echo "" >> /tmp/junky.log

This setup works perfectly. I'm glad I did it before attempting the concept with fetchmail, because it took me about an hour to get it to work perfectly. Armed with this experience, I'll soon do the same thing for fetchmail.

That same function is called immediately after the sleep, so that if a USR1 was received during the sleep, the daemon goes down before running another iteration of

The sv program sends a USR1 when its arg1 is 1.

You might wonder how I send a USR1 to the daemon. I simply use the following command:

svl 1 newtest

The arg1 of "1" sends a USR1.

A USR1 Aware Fetchmail Loop

I want to be absolutely certain that my downing the loop calling fetchmail never stops the fetchmail download midstream. I don't want to take my experimentation's word for it. I don't want to take the documentation's word for it. I don't want to take any expert's word for it. I don't even want to take Runit's author's word for it. I want certainty, even if I have to write 45 extra lines of shellscripting to gain that certainty. This section details how I gain that certainty.

The following is the code of /home/slitt/service_gui/run:

#!/usr/bin/env ksh
exec /d/at/fetchmail/fetchmail_loop

The run script is unchanged from the simple setup showcased earlier in this document. The following is the code of the /d/at/fetchmail/fetchmail_loop program, which changed quite a bit but is almost the same as the daemon program in the earlier Proof of Concept: Shutting Down With USR1 section:

#!/usr/bin/env ksh

waittime=180
logfile=/d/at/fetchmail/logs/fetchmail.log

stop_when_safe=/bin/false

on_usr1() {
   stop_when_safe=/bin/true
}

test_and_handle_safe_stop() {
   if $stop_when_safe; then
      printf "Stopping now!\n\n" >> $logfile
      /d/bats/svl down fetchmail
   fi
}

trap on_usr1 USR1
cd /home/slitt || exit 1
while /bin/true; do
   echo "" >> "$logfile"
   echo "" >> "$logfile"
   echo "Starting next fetchmail run" >> "$logfile"
   date +%Y%m%d_%H%M%S >> "$logfile"
   date +New fetchmail run: %H:%M:%S >> "$logfile"
   echo -n "PID is " >> $logfile
   echo $$ >> $logfile
   fetchmail -f /home/slitt/mail/fm/.fetchmailrc 1>&2 >> "$logfile"
   sleep 1
   count=$waittime
   echo "Fetchmail run finished, sleeping $waittime seconds" >> "$logfile"
   while (( count >= 0 )); do
      test_and_handle_safe_stop
      sleep 5
      (( count -= 5))
   done
   echo "End of sleep $waittime seconds" >> "$logfile"
   test_and_handle_safe_stop
done

The preceding is basically the same as the daemon script (not the run script) from Proof of Concept: Shutting Down With USR1, except:

Log Files

Runit creates and handles log files very well, including limiting log file size, rotating logs, and deleting older rotated logs. It also prepends a very readable, precise and accurate timestamp to each log entry. Runit's logging facility simply logs everything the daemon sends to stdout, so when you make your own log you needn't write to a special log file. Runit does it all for you, and does it cleanly and in good taste. It does log rotation according to simple command line specifications you give it.

Before getting into the mechanics of making runit write log files for a daemon, it's important to discuss some alternatives of where the log files should go.

Log File Locations

Obviously, don't place your log files on a temporary or ram disk partition, because you don't want them disappearing from time to time. For most systemd, this absolutely rules out the following trees: and trees.

Note that /run is specified to start empty with every reboot.

Your logs must reside on a partition big enough to handle them, and then some. If you want to use the /var tree, which is pretty standard for log files (/var/log), be sure it has enough room to handle logs for all your new daemons.

If you want them in your /home tree, be sure that partition has enough space. The df -h command is your friend.

You probably want your logs on the local machine rather than a remote-loaded partition from NFS or Samba or sshfs mount. Don't get me wrong, for your specific user runit log files would work on a remote mount because the remote mount is mounted before the user specific runit is started, but if the connection went down, so would your logs.

Danger!

The preceding paragraph concerned your user specific Runit only. For the system wide Runit, putting the logs on a mount to a remote system is absolutely inappropriate because logs should start early enough to capture everything; even events happening before the remote mount is completed.

If you believe that constant creating and deleting files on an SSD hastens the SSD's demise, then you don't want to put a rotating log file like created by Runit on an SSD or NVMe drive. Once all these requirements are taken care of, your choice of log locations is pretty much whatever you think is best.

Where's the best place for your daemon's log files? The answer is "it depends". It depends on whether you consider your log files to be data to be mixed with your other data. It depends on whether you want your log files backed up. It depends on which customs and standards you believe to be the most credible. It depends on your personal preferences, or your boss's, or your CIO's. There's no "one size fits all" answer.

There's an even more basic question that impinges on the "where do they go" question: Should your personal runit simply toss its log lines into the system wide logger, or should it build individual logs for each daemon? Each has benefits and drawbacks:

I can't tell you anything about the system wide logger alternative because it's distro dependent. Also, I'm not going to use the system wide logger for this. Therefore, I won't say anything more about using Runit logging with system wide loggers.

An excellent location, certainly the most Unixy location, and the location creating the least resistance and criticism is:

/var/log/users/slitt

In the preceding, of course you'd change "slitt" to the name of the user running the personal runit. /var/log/users should be owned by user root, group root, and permissioned chmod 755. Within that directory, slitt should be owned by slitt, and permissioned chmod 700 unless there's some reason you want people from one of slitt's groups or someone with no relationship at all to read the logs. Inside /var/log/users/slitt should be subdirectories like pulse_slitt, pager, fetchmail, etc. Each of those should be owned by slitt and chmod 700 unless there's a good reason to do otherwise.

You might wonder why I didn't suggest eliminating the users level and just placing pager etc. right in /var/log. The answer is there just might already be a user called "pager", causing a name clash. By using "users", only if a user is named "users" will there be a problem, and if that happens, you could call the directory "users_whatever" (substitute what you want for "whatever"), so that there's no clash, and then just forbid somebody from having name "users_whatever" (or whatever).

If you view these logs as the user's data and want them backed up with the user's data, then they belong in the /home tree. There are a million ways to do this: Let me present two...

First, you can do it the djb (Donald J Bernstein) way. Assuming the user to be "slitt" and the daemon directory to be /home/slitt/supervise_gui/pager, djb would put the logs in /home/slitt/supervise_gui/pager/log/main. This has the huge advantage that everything for the daemon you created for pager is contained within the /home/slitt/supervise_gui/pager tree, and it's really convenient that the run file within the pager/log directory can be written relatively, as follows:

#!/usr/bin/env ksh
exec svlogd -tt ./main

Everything in a nice, tidy, encapsulated package.

Perhaps the preceding doesn't appeal to you. Truth be told, for some reason I can't put my finger on, I don't like it myself, although there's no rational reason why not. But if you want the logs as the user's data to be backed up with the user's data, another alternative is to put it under ~/rlogs, as follows (example for the pager daemon for user "slitt":

$HOME/rlogs/pager

The Freedesktop Connection

There's a whole specification concerning where the Freedesktop organization thinks things should go, at https://specifications.freedesktop.org/basedir-spec/latest/. Different people assign different importance to the Freedesktop specs, ranging from people who pay no attention to them to people who take the Freedesktop specs more seriously than they take the POSIX customs. I fall in the former category, so I know almost nothing about Freedesktop, meaning that I'm the wrong guy to ask about Freedesktop specs and customs.

Based on the Freedesktop specs linked to in the preceding paragraph, it seems to me that those wanting to comply with Freedesktop would prefer to put their log files in a tree headed by $XDG_STATE_HOME.

My Personal Choice

For my purposes, I don't want my log files mixed in with my data, so my log files will not go anywhere in the /home tree. From my personal perspective, one of the big benefits of having a personal Runit is having everything seamlessly owned by user slitt, so I won't be putting them anywhere in /var/log or anywhere else where I'd need to do special steps to make them writeable by slitt. Furthermore, I don't want my log files backed up, so I'll be putting them within my /scratch/rlogs tree. /scratch doesn't get backed up, and it's already owned by slitt, and the mount partition for /scratch is huge, so it can accommodate a lot of big log files.

Note:

If you don't like making the /scratch directory directly off of the root directory, or if you're not allowed to create it, just substitute /home/slitt/scratch for /scratch, once again remembering to substitute the user's login name for "slitt".

So for my "newtest" daemon, the log files will reside in /scratch/rlogs/newtest. If you're wondering why there isn't a username in the preceding directory, it's because in my use case nobody except user "slitt" uses a personal Runit, so the user is implied.

Important!

My situation is unusual, so my log locations are unusual. Your log locations will almost certainly be different from mine. There's no "right" place for log files. Just examine your situation and place them accordingly. Think about it for awhile before going on.

Runit Sends Daemon's stdout To the Log

In this document's previous examples, the daemon has written directly to a file. Many daemons were designed to write directly to a file. Runit does all the work for you, by propelling the daemon's stdout to the log file. Therefore, your daemon should write nothing to stdout except what it wants logged. If you also want to capture the daemon's stderr in your log, that's possible too.

Runit Log File Overview

So now you know where you want to put your log files. This subsection shows you how to put the logs for user "slitt" invocation of daemon "newtest" into /scratch/rlogs/slitt/newtest. This procedure is designed so that your existing "newtest" daemon, which is now running without a log, stays running as long as possible and has as brief a down period as possible.

Your first step is to make a copy of /home/slitt/mytest.sh. Copy it to /tmp/mytest.sh. Within that copy, you'll be adding three lines, which are italic and a different background and foreground color in the following code:

#!/usr/bin/env ksh
echo "Starting the newtest daemon."
stop_when_safe=/bin/false

on_usr1() {
	stop_when_safe=/bin/true
}

test_and_handle_safe_stop() {
	if $stop_when_safe; then
		echo "Stopping now!" >> /tmp/junky.log
		echo "" >> /tmp/junky.log
		echo "" >> /tmp/junky.log
		echo "Stopping the newtest daemon."
		/d/bats/svl down newtest
	fi
}

trap on_usr1 USR1
cd /home/slitt || exit 1
while /bin/true; do
	echo -n "PID is " >> /tmp/junky.log
	echo $$ >> /tmp/junky.log
	echo "Starting ./count2ten.sh."
	./count2ten.sh
	sleep 1
	test_and_handle_safe_stop
	sleep 10
	test_and_handle_safe_stop
done

Instead of going to /tmp/junky.log, these three new statements output to stdout, where they're picked up by Runit and sent to Runit's log file for this daemon. Please remember, this new file is called /tmp/mytest.sh

There is no need to change anything in count2ten.sh, which continues to output to /tmp/junky.log as always. Only the three new lines end up in the Runit log for the daemon.

Note:

If you don't like making the /scratch directory directly off of the root directory, or if you're not allowed to create it, just substitute /home/slitt/scratch for /scratch, once again remembering to substitute the user's login name for "slitt".

The following is the recipe for setting up logging for the newtest daemon:

  1. mkdir -p /tmp/temp/log to create a temporary directory for assembling the log directory. You can put it anywhere.
  2. cd /tmp/temp/log to get into that directory.
  3. Within that directory, create the following file, calledrun
    #!/usr/bin/env ksh
    exec svlogd -tt /scratch/rlogs/newtest/
    
  4. chmod u+x run This step is absolutely vital, because if you forget to do it, you'll have all sorts of nasty, time consuming debugging to do.
  5. svl down newtest
  6. cd /home/slitt/service_gui
  7. rm newtest : This completely disconnects newtest from your personal Runit, enabling you to do what you need to do without creating a confusing and time consuming state problem.
  8. mv /tmp/temp/log ~/service_all/newtest : This installs a ready to go logging facility in your newtest daemon.
  9. cp /tmp/mytest.sh /home/slitt
  10. chmod u+x /home/slitt/mytest.sh
  11. ln /home/slitt/service_all/newtest /home/slitt/service_gui/newtest
  12. tail -f /tmp/junky.log in order to test that the daemon is doing its job.
  13. tail -f /scratch/rlogs/newtest/current to test that the log is being written to. The log should be written to about every ten seconds. Don't stop this tail -f process.
  14. svl down newtest to bring the daemon down. Verify that "
  15. After downing the daemon, verify that "Stopping the newtest daemon." was written to the log, /scratch/rlogs/newtest/current.
  16. svl newtest to bring the daemon back up.
  17. Verify that "Starting the newtest daemon." was written to the log, /scratch/rlogs/newtest/current.

If both /tmp/junky.log and /scratch/rlogs/newtest/current were written to as specified, congratulations, you've just implemented Runit with logging. If not, go to the section on Troubleshooting, earlier in this document.

The Beauty of Simplicity

The simplicity of runit's multi-daemon supervisor system is breathtakingly beautiful. As shown earlier in this document, reasonably complete understanding of this system is gained from two simple diagrams, one for startup and one for control. A basic specific user setup uses only three programs: runsvdir, runsv, and sv. Making things simpler is the fact that you can achieve the control functionalities of sv primarily by echoing characters into supervise/control.

In fact, conceptually speaking you could call sv a front end to supervise/control and supervise/status. The sv source code shows this to be an oversimplification, but conceptually it's a great starting point.

Another beauty of the runit multi-daemon supervisor is that it uses the Unix/POSIX functionalities to accomplish its magic. Unix symlinks. Unix cooperative file locking. Unix named pipes. Unix directory/file hierarchies. Half of runit was already written before line one of runit or its ancestor, daemontools, was written. Speaking of daemontools...

The djb Connection

Daniel J Bernstein, otherwise known as djb, wrote the multi-daemon supervisor called daemontools in 2001, in order to provide a stable control mechanism for his server software such as djbdns and qmail. People, including me, used daemontools to supervise his other software (djbdns in my case), and noticed how much better it was than the the the existing daemon management facilities. So many us supervised other daemons with djbdns. djbdns could be run from all the init systems of the day (and this is still true today), so people began to substitute it for the ugly daemon management from those init systems.

A few years later several full init systems, with daemontools-inspired multi-daemon supervisors, were created. The two most widely used of those init systems were runit and s6. Both runit and s6 are simple and spectacular. runit is simpler, s6 gives the admin more control. You can't go wrong with either.

djb is a genius mathematician and security expert. I'm pretty sure he wrote daemontools as a quickie supervisor just for his own software, so instead of reinventing the wheel or incorporating all sorts of databases and other geegaws, he just quickly built it on top of Unix.

Don't Reinvent the Wheel

Every time I code anything, wise developers leapfrog each other telling me I shouldn't reinvent the wheel, instead I should use other peoples' code. djb did "don't reinvent the wheel" the right way, using Unix and the standard C libraries, and a directory tree for a database.

The Beauty of Understanding Your Software.

daemontools and runit are so conceptually simple I could have written them myself. My version would have had a text based supervise/status, so it wouldn't have been as efficient, performant, featureful or bug free, but it would have served the purpose until people smarter than I made them better. I have a feeling half the people reading this now could have written them, if they were given a proper specification.

Don't Believe Everything You Hear

The Internet being the Internet, you'll hear critics say this document is inaccurate, too long, too complicated, too simple, not useful. Some will call it propaganda, or the ravings of a graybeard stuck in the past and afraid to learn new things. Others will take great offense that its recipes don't work exactly for their favorite distro, or that I broke the rules of the Linux File Hierarchy Standard (FHS) or the Freedesktop.Org Extended Desktop (XDG) standards. And of course there are the guys who don't like the document's layout and colors, calling it sooooo 1998!

I could easily refute every one of these assertions, but doing so would be a disservice to the vast majority of readers who came here to DIY, not to criticize.

And just so we're all on the same page, use of Specific User Runit has absolutely nothing to do with the init system you happen to be using.

If you find a genuine inaccuracy or an ambiguity in this document, please email me so I can fix it.

Summary

Although runit can be used as a full blown init system to replace sysinit, systemd, OpenRC and the various init systems used on BSD, this document uses only runit's daemon supervision capabilities and therefore is init system agnostic. This document is equally valid no matter what init system you're using.

The following two Mental Models (block diagrams) spell out the workings of the runit multi-daemon supervisor:

runit Startup Mental Model


runit Control Mental Model

In other words, the runit multi-daemon supervisor is dead-bang simple and exquisitely beautiful in its simplicity.

Reading this document, it's certain that you noticed that I used a very odd set of directories for some things, especially my use of /scratch, which is a blatant violation of the FHS. There's a reason for this: I didn't make this document out of the goodness of my heart. I made it as an as-built drawing of the system I actually use, in my very particular and unique situation, to run my home-grown daemons that should be run as user slitt. Although I suggest, for ease of learning and practice, the first time going through this document you use the locations I use. With one exception: The use of /scratch right off the root is a bridge too far for many people, in which case you can use /home/slitt/scratch.

Speaking of "slitt", please remember to substitute the specific user's username for every instance of "slitt" in this document.

Over half this document is a step by step recipe for making, testing and troubleshooting a specific-user runit. It looks long, but that's just because each step is a single step, so that the procedure becomes unambiguous.

In my experience, using a personal runit to supervise Pulseaudio, fetchmail, my reminder daemon and my pager daemon is that things go better.

If your daemon consists of a looping shellscript to repeatedly call a program and then sleep a specific amount of time, you don't want the server to go down in the middle of the called program. The way to run such a looping shellscript is for it to trap the USR1 signal and change a variable on receipt of the USR1. Then, during the sleep phase, keep testing that variable, and if the variable has been changed, use the svl shellscript, which is a slight modification of the sv program provided by runit, to down the looping shellscript. You send the USR1 signal to the looping shellscript with the following command:

svl 1 newtest

Obviously, substitute the actual daemon directory name for newtest. But what if the called program hangs so the variable is never tested? In that case, you can take the daemon down hard with the following command:

svl down newtest

If that doesn't work, you can murder the daemon with the KILL signal, using the following command:

svl kill newtest

The preceding is just like a kill 9, with all the certainty and all the mess.

Runit is very good at keeping log files for any daemon supervised by runit. This includes log rotation and deletion of the oldest log files. The location of the log files can be defined on a per-daemon basis.

Runit's daemon supervision is beautiful in its simplicity. A person with Unix, BSD or Linux familiarity can easily understand runit. This is something that can't be said for most other software.

Creating my own user specific runit for user slitt has improved my computer's reliability and workflow, and given me a way to supervise any daemon I might want to write. And because runit doesn't require its daemons to background themselves, writing daemon software to be supervised by runit is much easier and simpler than the insane double-forks a backgrounding daemon requires.

If you feel the need to optimize the performance of your Linux, Unix or BSD computer, I recommend creating your own personal runit.