Troubleshooters.Com, Linux Library and Init System Choices present:
User Specific Runit Supervisor
Copyright 2025 by Steve Litt, all rights reserved.
See the Troubleshooters.Com Bookstore.
Contents:
svl
ShellscriptRunit is a great init system, and also when it's connected to a different init system's PID1 it becomes a great multi-daemon supervisor. Using runit as an init system or a system-wide multi-daemon supervisor is not a topic for this document and won't be discussed further.
Another great use of runit is to run daemons for a specific user, in my case user "slitt". All examples in this document are for user slitt, so when you use this document's info to build your own multi-daemon supervisor for a specific user, be sure to change all instances of slitt to whatever user name you want to run its own daemons.
All shellscripts in this document are run with the ksh
shell. You can choose a different shell, like dash
, bash
, /bin/sh
, or anything reasonably resembling ksh
, dash
, and bash
, but personally I never use bash
in shellscripts (bash
is huge and remember Shellshock?). In my opinion bash
is a great interactive shell and a poor scripting shell. For shellscripts, I love ksh
because it works beautifully.
Where specific-user runit excels is in the GUI environment, so this document concentrates on running runit from the ~/.xinitrc
shellscript. ~/.xinitrc
is the script that is run when you issue the startx
command. If, instead of using startx
, you boot right into your window manager (and perhaps desktop environment), then you'll need to ask for help from a mailing list or forum or whatever for your specific distribution.
In this document, actually binding the runit instance to .xinitrc
is delayed until the end.
Throughout this document, the username is represented by the string "slitt", because that's less ambiguous than things like "<user>" or even "my_name". Obviously you'll change all occurrences of "slitt" to the username of the user using this specific-user runit.
This document uses some odd, and perhaps inconsistent filenames and directory names. This is because this document is an as-built description of my user-specific runit for user slitt. I use this personal runit all the time now. Some directories, such as /scratch
, are unique to my situation. Others, such as /home/slitt/mytest.sh
came about because I experimented fast and then wrote the document to match. As it is, this document took a month, off and on, to write. Going back and adjusting everything for the best academic documentation would have taken more time than I could devote.
In summary, the username, directory names and filenames in this document are an as-built description. I did it this way so I wouldn't need to create and tech-edit twice.
If you object to /scratch
, or if you're not allowed to create /scratch
, then I suggest you substitute /home/slitt/scratch
(but use your own username). Other than that, I suggest that you just go along with the filenames and directory names in this document, for DIY learning purposes. This way you can go through it fast, with a minimum of transcription errors.
Once you have a good grasp of specific user runit, I suggest that for your real, production system, you substitute the file and directory names best suited to your situation.
Runit can be an entire init system (including PID 1), but that capability is beyond the scope of this document. This document concerns itself exclusively with the multi-daemon supervisor part of the runit init system, and further narrows its scope to the use of this multi-daemon supervisor by and for a specific user. Throughout this document, this user is called "slitt", but unless your user is also called "slitt", you'll replace every instance of "slitt" with the username of the user for whom you're setting up your user specific runit multi-daemon supervisor. Now it's time for a little vocabulary...
Tip:
Use your browser's "duplicate tab" feature, typically accessed by right clicking the current tab, to put a copy of this tab on another. By doing this, you can have one tab always open to the Vocabulary subsection to understand words and phrases as they come up.
ls
, and Inkscape are all process, or perhaps multiple processes tied together./etc/services
file. However, with the advent of systemd, things like mounting partitions and bringing up the network are also called "services", so in 2025 the word can't be used unambiguously unless its context is defined, so I avoid this word today.runsv
. When I say the single-daemon supervisor "supervises", I mean that: sv status
command whether the daemon is running or down, its daemon directory (explained later), its uptime or downtime, its PID, and sometimes other info such as whether the daemon is paused (still running and having the same PID, but not processing).runsvdir
./home/slitt/service_gui
.run
, a directory called supervise
that contains files and named pipes to maintain the proper state of the daemon, and perhaps files called finish
, down
, or a directory called control
.The following diagram is a Mental Model (block diagram) of the startup of a runit daemon supervisor:
Note:
The diagram calls the individual daemon directory subdir
, so this narrative does too.
Runit's daemon supervisor is a program called runsvdir
. For the purpose of user specific runit setups, you don't need to worry about PID1, so you'll deal only with runit's multi-daemon supervisor, whose program name is runsvdir
. runsvdir
cycles around a specific directory (the service directory), and for each specially made daemon directory (like newtest
) contained within the service directory, runsvdir
runs a single daemon supervisor called runsv
, which in turn runs the daemon. In this document the service directory is /home/slitt/runsvdir_gui
.
runsvdir
continually loops through every daemon directory (such as newtest
) within the service directory. For each daemon directory, runsvdir
forks an instance of runsv
, which supervises a single daemon. Here's how runsv
does it:
runsv
forks an instance of the daemon directory's run
script.run
script ends with exec mydaemon arg1
. There can be as many arguments as is necessary for mydaemon
, which of course is just a contrived name: Your daemon program's name will be different.run
script becomes mydaemon
(in this example) because of the exec
on the call.So if the daemon directory name is subdir
and the daemon's filename is mydaemon
, the process tree look like the following:
runsvdir /home/slitt/service_gui .......... | |_ runsv subdir | |_ mydaemon arg1
In reality, your runsvdir
command will probably have 300 to 400 dots instead of the 10 shown in the preceding tree, but for clarity's sake I only included 10 dots in the tree.
Because the fork()
command returns the new child's PID to the parent, the runsv
instance knows the PID of the daemon it forked. And for the same reason, runsvdir
knows the PID of every runsv
command it forks. So there's no need for PID files.
The following diagram is a Mental Model (block diagram) of the startup of a runit daemon supervisor:
Let's describe this from the daemon's runsv
outward. The daemon directory (called subdir
in the diagram and therefore in the narrative) contains a directory called supervise
(how it got there is explained later), which contains other files and directories including control
, status
and stat
. stat
always has a text representation of part of the daemon's status and is useful only for humans, not for the rest of the software.. status
has a binary representation of all status information, including current PID (or previous PID id currently down), the uptime of the daemon's single daemon supervisor (runsv
). It also has whether the daemon is running, typically "run" for running and operating normally, or "down", meaning there is no daemon process currently running. It can take on other values besides "down" and "up".
About the PID: runsv
always knows the PID of the running or stopped daemon because the PID is stored in status
. Therefore runsv
can send the daemon signals by means of its PID. The signals runsv
sends to the daemon depend on the the letter received by supervise/control
. Please be aware that a single letter can cause runsv
to send multiple signals to the daemon.
It's also important to understand that runsv
reads each character as it comes in, not waiting for a newline. If the letter is something it understands, it sends the appropriate signals to the daemon and then removes the letter from the pipe. If it doesn't recognize the letter, it ignores the letter and removes the letter from the pipe.
runsv
recognizes many different letters, although only "u" and "d" are discussed in this document. You can see all the letters, and the signals runsvdir
sends to the daemon as a result, on the runsv man page.
How the supervise
directory is created
The supervise
directory is created, with its files such control
, status
, lock
, ok
, stat
and pid
, and the first time both of the following are true:
runsvdir
program is running on that same service directory.Within the supervise
directory are the following files and named pipes:
control
: A named pipe that reads character by character written to it. If the letter is "u" runsv
starts its assigned daemon. If runsv
doesn't recognize the character, it just reads it and does nothing. If runsv
recognizes the character, then runsv
sends signals to the running daemon, according to which letter it received. You can change the relationship between character and signals with a shellscript, that is named the same as the letter, in an optional control
directory in the daemon directory, but this capability is seldom used. Also, you might be thinking you can put, let's say z
in the code
directory to add "z" to the letters recognized by runit
. That doesn't work: I tried it. Most sv
commands work by writing one or more letters to control
.status
: This is a binary file containing the state of the single daemon supervisor. It is consulted by the sv status
command for the single daemon supervisor.lock
: runsv
tries to put a lock on this file before doing anything else. If the file is already locked, runsv waits until it unlocks. If it's not aleady locked, runsv
locks it. Bottom line, you can't run runsv
on the same daemon unless you set up the second copy as a different daemon directory. This locking prevents race conditions and other unpleasantness occurring when two copies of the same daemon run simultaneously.ok
: A named pipe that does something important, important enough to cause error messages if it's unavailable.stat
: A human readable, for-humans-only text representation of the run state, typically either "run" or "down".pid
: A text file, for-humans-only, showing a text representation of the PID of the running daemon. It's empty when the daemon isn't running.The interface described in this subsection is great and simple, but wouldn't it be nice to have a front end to it? Runit gives you such a front end: It's called sv
. sv up subdir
writes a "u" to supervise/control
. sv down subdir
sends a "d". sv restart subdir
sends a "du" string. sv status subdir
reads the contents of supervise/status
, and the uptime or downtime for the daemon, and prints whether it's running or down, the daemon directory, the daemon's PID if available, how many seconds, and sometimes other things, for instance, whether if it's paused, and sends a report to stdout.
sv
For a Specific UserThe preceding description of sv
is a front end to the main OS instance of runsvdir
, not the one run for the specific user. To run sv for specific user slitt, I made the following shellscript, called svl
, located on the executable path given to user slitt:
#!/usr/bin/env ksh svpath=/home/slitt/service_gui sv $1 $svpath/$
Joe average doesn't need a user specific runit (actually a user specific multi-daemon supervisor). The following are some factors that tend to indicate a need for you to have a user specific runit:
The next several sections walk you through setting up your own user specific runit. The following is a list of several sections after this that are steps in the process. Please continue reading this section until its end before going on to the steps.
svl
ShellscriptThe first three steps are performed only when first setting up your user specific runit. Once you've set up your .Xinitrc
, or whatever program starts your GUI system in such a way as to fork and exec runsvdir
, the second to last step happens automatically when you start X11 or Wayland. The final step is done every time you add or modify a daemon.
The next few sections of this document detail each of the steps, for an incredibly simple proof of concept daemon that does nothing but write a second-by-second log file.
Remember:
Be sure to change all occurrences of "slitt" to whatever username you're using!
svl
ShellscriptCreate the following shellscript, called svl
, inside a directory on the executable path for the specific user (slitt in this example). /home/slitt/bin
would be an excellent place to put it. If there's already an executable program or script called svl
, just change the name. All that's necessary is that you be able to remember the shellscript's filename, and that it be short, because you'll be using it a lot.
#!/usr/bin/env ksh svpath=/home/slitt/service_gui sv $1 $svpath/$2 [slitt@mydesk ~]$
Now make it executable:
[slitt@mydesk ~]$ chmod a+x /home/slitt/bin/svl [slitt@mydesk ~]$
Now you can use svl
as an equivalent for runit's sv
command, specifically for the runit belonging to user slitt.
mkdir /home/slitt/service_gui
This is the directory that runsvdir
cycles through. This is the directory containing a symbolic link to each daemon directory you want supervised.
You could theoretically put regular directories here instead of symbolic links. Don't do that. Using symbolic links makes it very easy to completely disconnect a daemon from runit, and later connect it back again. This makes runit troubleshooting a lot less difficult than it would otherwise be.
mkdir /home/slitt/service_all
This directory contains all the daemon directories useful to the specific user (slitt in this document), whether or not the daemon is currently connected to service directory runsvdir
. You connect a daemon directory to runsvdir
by making a symlink to it within the service directory /home/slitt/service_gui
in this document.
Think of it this way: The all daemons directory, /home/slitt/service_all
, contains real subdirectories in the form of daemon directories. The service directory, /home/slitt/service_gui
, contains symlinks to the daemon directories in /home/slitt/service_all
.
Ugh!
Some runit setups have symlinks to symlinks to symlinks. Part of this is to implement "runlevels", which I view as an antique from the 1990's now rendered useless. Multiply cascaded symlinks invariably lead to trouble unless in the hands of a 160 IQ green eyeshade accountant. I've purposely kept this document to a single level of symlinks so that mere mortals can use the material without getting into trouble.
I call the following program /d/bats/runsvdir/runsvdir_slitt.sh
. You can call it anything you want, and you can put it in any directory you want, although I suggest you don't put it on the executable path, because typically it should be run only when you first run your window manager/desktop environment.
Notice the 400 dots at the end. These dots make space for 400 characters of error logging when some daemons throw errors.
#!/bin/ksh exec runsvdir -P /home/slitt/service_gui Log: ................................................................................................................................................................................................................................................................................................................................................................................................................
Later, when everything's set up, you can run this program manually with the following command:
[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh
[slitt@mydesk ~]$
Also, make sure the preceding works in the background:
[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh &
[slitt@mydesk ~]$
#!/bin/ksh while /bin/true; do date +%H:%M:%S >> /tmp/junky.log sleep 1 done
Make the preceding executable with the chmod
command. In my test I called the preceding program /home/slitt/mytest.sh
, which is not on the executable path. You can call yours any name you want and put it anywhere owned by the specific user (slitt in this document).
When the preceding program is running, every second it appends a timestamp to file /tmp/junky.log
. To test the preceding program, run two terminal emulators on your screen: One on the left and one on the right. In the right one, run /home/slitt/mytest.sh
. In the left one run the following command:
tail -fn0 /tmp/junky.log
If all is good, you'll see a new timestamp every second in the left terminal. Otherwise, troubleshoot. The daemon program, which I call /home/slitt/mytest.sh
, must be functional before continuing.
Once everything's running properly, Ctrl+c out of the processes running in both terminals, to prevent weird things from happening in later steps.
[slitt@mydesk ~]$ mkdir /home/slitt/service_all/newtest
[slitt@mydesk ~]$
In this exercise example I'm calling the daemon directory newtest
, which will later be symlinked into service directory as /home/slitt/service_gui/newtest
Put the following shellscript, called run
, inside directory /home/slitt/service_all/newtest
#!/bin/ksh exec /home/slitt/mytest.sh
The preceding simply exec's /home/slitt/mytest.sh
, which means that the run script runs /home/slitt/mytest.sh
under the run script's own PID. The result is that the supervise/pid
that held the PID of the run script now holds the same PID, but it now executes /home/slitt/service_gui/newtest/run
, and is revealed by the ps
command to be named /home/slitt/service_gui/newtest/run
, even though it has the same PID as the run script had before the exec
command at the bottom of the run script.
But none of this happens until file /home/slitt/service_all/newtest
is symlinked inside the service directory, /home/slitt/service_gui/newtest
DANGER!
Be absolutely careful to perform this step. If you fail to make the daemon's run script executable, later you'll be forced to do some intricate, frustrating and lengthy debugging.
[slitt@mydesk ~]$ chmod a+x /home/slitt/service_all/newtest/run
[slitt@mydesk ~]$
[slitt@mydesk ~]$ ln /home/slitt/service_all/newtest /home/slitt/service_gui/newtest
[slitt@mydesk ~]$
The preceding command enables the daemon to be run by the multi-daemon supervisor, runsvdir
. But of course that doesn't happen until the multi-daemon supervisor is running. Read on...
[slitt@mydesk ~]$ /d/bats/runsvdir/runsvdir_slitt.sh &
[slitt@mydesk ~]$
The preceding command calls /d/bats/runsvdir/runsvdir_slitt.sh
, which runs runit's runsvdir
command on the service directory, /home/slitt/service_gui
, thereby getting the whole runit supervision system to work.
If you've done everything absolutely correctly, the system should work absolutely correctly. This usually doesn't happen in real life, so you need to test. First, test and see if everything's perfect:
tail -fn0 /tmp/junky.log
. If it prints a new timestamp every second, things are looking up.svl down newtest
. If the terminal emulator on the left side stops printing new timestamps, you can be almost certain that your user specific runit is working correctly with your simple daemon, whose directory is /home/slitt/runsvdir_gui/newtest
, is running correctly. If (and more likely when unless you're a very careful person) it doesn't work, continue on to the Troubleshooting section of this document, and every time you think it's working, test with this section.Note:
This section uses "slitt" for the username, newtest
for the daemon directory name, /home/slitt/mytest.sh
is the actual daemon, /home/slitt/service_gui
is the service directory, and /home/slitt/service_gui
is the "all daemons" directory.
Some of these are obviously not optimal, especially an executable contained in /home/slitt
, but they work for the purpose of demonstration.
The runit multi-daemon supervisor system isn't easy to troubleshoot, but you can do it.
First, before performing any diagnostic tests that change anything, get all the info you can. Start with the following command:
[slitt@mydesk supervise]$ svl status newtest
down: /home/slitt/service_gui/newtest: 11s, normally up
[slitt@mydesk supervise]$
The preceding output tells you the following, in order from left to right:
/home/slitt/service_gui/newtest
is the daemon directory the info is applicable to.down
.Next, look at the output for runsvdir
with the following command:
ps ax | grep runsvdir | grep service_gui
Remember those 400 dots in arg2 to runsvdir
? Those are called the "log" because any runsvdir
error messages take the place of some of the dots. Theoretically, the above mentioned ps
command should show nothing but dots. If there are error messages in this command, copy them into a file so you can later use them to troubleshoot.
If there's a log file for this daemon, (not for runsvdir
, but for the daemon appearing to cause the error message written over the dots), look at it. The following is also a helpful info source, run as user root:
tail -fn40 /var/log/messages
Look over the listings the preceding command gives you, trying to find any that might be relevant to your runit problem and be observant of new lines that appear.
A special problem that occurs quite frequently is that your svl status
command usually says "up" for the state, the seconds up is 0, 1, or a very small number of seconds, and the PID keeps changing. This is an indication that your daemon either terminates quickly or is not being run at all. This is powerful information when you start doing diagnostic tests with which you manipulate the daemon and its directory and runsvdir
. This condition usually means that the daemon's runsv
can't fork the daemon directory's run
script, or that the run script is terminating quickly. Make a note of how often it seems to reset, because it will come in handy later. Your job then becomes to discover which of these two possibilities is the root cause, and then narrow down to the root cause.
There's a certain class of problems that point an accusing finger at the supervise
tree inside the daemon directory. The following error messages point this accusing finger:
If you get the "unable to start ./run: access denied" within the dots, your ./run
script and/or the daemon it exec's probably is not chmoded executable for the specific user (slitt in this document). Chmod both to executable for the user (or for everybody if you like doing things that way) and try again. svl restart
. If you're lucky, things start to work. However, the typical result of non-executability is that even after it's corrected, there's been consequential damage or misalignment to something in the supervise
tree, causing a problem with state, so you need to bring out the big guns, brute force troubleshooting.
Before doing brute force troubleshooting, perform the non-destructive tests described in the preceding subsection. Then attempt to fix any root causes the non-destructive tests make you suspect.
Runit supervisor problems often involve a problem with state, so troubleshooting can get brutal. Because runit daemon supervisors are so simple, brute force troubleshooting is often the quickest and easiest route to a solution.
I call the following outline based recipe brute force troubleshooting:
cd /home/slitt/service_gui/newtest
sv down .
to stop the newtest
daemon. Please notice the use of sv
rather than svl
. You don't need and shouldn't prepend a path, to the daemon directory, because you're already there.cd ..
to go up one directory.rm newtest
to delete the symlink. Now you've completely disconnected the daemon run by directory memtest
from the supervisor, runsvdir
.cd /home/slitt/service_all/newtest
to get into the non-symlink copy of the daemon directory../run
. Either it works correctly or it doesn't. ./run
works: ./run
isn't getting run, or something's wrong in supervise
./home/slitt/service_gui/newtest
is the specific user (slitt in this document) with command ls -ldF ../newtest
, and if not use the proper chown
command to make the specific user the owner.run
is the specific user (slitt in this document), and if not use the proper chown
command to make the specific user the owner.run
is executable by the specific user (slitt in this document), and use the proper chmod
command to make it so.rm -rf supervise
to get rid of all potential state problems.ln /home/slitt/supervise_all/newtest /home/slitt/supervise_gui/newtest
to install the symlink, attach the daemon to runsvdir
, and start the daemon. If it doesn't work, try some more things../run
doesn't work: ksh
to duplicate the shebang in your run
script and return you to your starting point once the exec
happensrun
script onto the terminal running ksh
. If any commands error out, investigate.exec
line, either it works properly or it doesn't. If it doesn't work properly, investigate it as you would any other malfunctioning program. If it does work correctly, there's some kind of strangeness with run
or the daemon executable.ln -s /home/slitt/service_all/newtest /home/slitt/service_gui/newtest
in order to clean-start the newtest daemon.Pulseaudio is sound software enabling several sound sources to access your sound system independently, at the same time. You can often accomplish the same thing just with ALSA, an older and in my opinion more robust sound software. But for certain things you need Pulseaudio. On some Linux distributions Pulseaudio works pretty much right out of the box. On other distributions it can turn into a holy mess. It doesn't help that there's a lot of contradictory advice out there telling you how to set up and use Pulseaudio. This section gives you a distribution agnostic, "just works" method for controlling Pulseaudio using a user specific runit.
Note:
Pulseaudio must not be run as root.
Everythings the same as in the newtest
daemon directory detailed earlier. The only exception is the contents of the run file. I call my daemon directory for Pulseaudio pulse_slitt
. The run
script inside pulse_slitt
contains the following:
#!/usr/bin/env ksh pkill pavucontrol pavucontrol & exec /usr/bin/pulseaudio --daemonize=no --exit-idle-time=-1
The preceding run
script has the following line by line explanation:
#!/usr/bin/env ksh
: The script's shebang line, telling it to evaluate its contents in the ksh shell.pkill pavucontrol
: Kill all instances of pavucontrol
, the mixer/level program for Pulseaudio. For some reason, every time you stop and restart Pulseaudio, additional redundant and confusing sliders get added.pavucontrol &
: Run a new copy of pavucontrol
in the background. Because it's a GUI program, even though its process is in the background and its text output is invisible, its GUI interface is visible and ready for manipulation.exec /usr/bin/pulseaudio --daemonize=no --exit-idle-time=-1
: exec
pulseaudio
in the foreground (--daemonize=no
) and stay running through idle periods (--exit-idle-time=-1
)If you're having trouble with Pulseaudio, consider putting it a user specific runit, even if it's your only reason for a user specific runit. With your Pulseaudio supervised by your personal runit, everything pretty much falls into place.
Note:
You probably need to add yourself to the "audio" group in order to have Pulseaudio work correctly. Also, you'll need to set autospawn = no in either the user Pulseaudio client configuration file or the system-wide Pulseaudio client configuration file. You can find details on these files, including their names and locations, from Google's AI Overview.
I have migrated the following to my personal runit:
All four perform much better now that they're supervised by my personal runit. In fact, moving them to runit fixed some longstanding bugs in the pager loop and the reminder loop. I think you'll find a similar improvement
This was probably the easiest to move over, because I wrote it only a year or so ago, and I was willing to live with some compromises in order to get it into my personal runit. My pager daemon spins around listening for email to a certain email address. When it receives a *qualifying* email, it rings some bells, throws up a very noticeable window on my screen, and of course puts the email in the proper mailbox so I can read it. The purpose is so if I'm giving a presentation or class on Jitsi and my Jitsi connection gets disconnected, people can email that address and tell me the connection is lost. When you give online presentations and classes, this is a must!
My pager is a very over-engineered Python program that moves heaven and earth to go down only after all pager emails have been acted on. Considering how rarely this program is brought down, this might be overkill. So for now I just made the following pager run script:
#!/usr/bin/env ksh cd /d/pager mkdir -p /tmp/pager/completed rm reload.int rm /tmp/pager/pager.pid exec ./loops.py
In the preceding, the two rm
commands remove files that would have operated on the python program but are no longer necessary or even correct. A heck of a lot of loops.py
addresses looping over and over again through the newly received pager emails. Once managed by my personal runit, life is much better if the loop part is handled by a shellscript that calls the Python program to go through just once and then sleep, but for the time being I'll leave it the way it is because it works.
Once the loop is run by a shellscript calling a one-pass Python program, I can accomplish the ideal of going down only at the beginning or end of the sleep using a USR1 signal that sets a variable that's tested at both ends of the sleep, and if the variable is True, send a HUP to the Python program.
My fetchmail shellscript was given me by a mentor over 20 years ago. It used fetchmail
in daemon mode. I guess it made sense then, but it sure doesn't now. So when I moved fetchmail
over to my personal runit, I made my daemon just a looping shellscript that calls fetchmail
to do one check, and then sleeps for X number of seconds. By having each call to fetchmail
do one pass and putting the loop in a shellscript supervised by my personal runit, I got rid of a huge, nasty shellscript I'd been using for over 20 years. Not only that, but I'll later be able to make sure it goes down only just before or just after the sleep
command, rather than while pulling emails from an IMAP server and maybe losing some email. The following is my run
script for my fetchmail
:
#!/usr/bin/env ksh exec /d/at/fetchmail/fetchmail_loop [slitt@mydesk fetchmail]$
And the following is /d/at/fetchmail/fetchmail_loop
#!/usr/bin/env ksh waittime=180 logfile=/d/at/fetchmail/logs/fetchmail.log while /bin/true; do date +%Y%m%d_%H%M%S >> "$logfile" fetchmail -f /wouldnt/you/like/to/know/.fetchmailrc 1>&2 >> "$logfile" echo "" >> "$logfile" echo "" >> "$logfile" sleep "$waittime" done
As I said, very soon I'll make fetchmail_loop
honor signal USR1 in order to wait until a complete pass before downing the daemon, so there's no chance of losing email. Pretty cool, right? It all begins when you create your own personal runit.
In a response to this document, somebody brought up the very real fact that fetchmail is designed from the bottom up to be a daemon, and a very good and complete daemon, so why do I need a looping shellscript to control a single shot fetchmail?
As an answer, consider the following points:
Bottom line: From runit you can only start or supervise a one-shot fetchmail run, so if you want to supervise fetchmail from runit you'll need to write a looping shellscript. So the obvious question is, why use runit to supervise fetchmail? I can think of these reasons:
Fetchmail is a spectacular program. I've used it almost every minute of every day for 12 years. Fetchmail has very few flaws, but one flaw it does have is to make the background/foreground flag also control whether you want fetchmail to loop or not. If these two factors were controlled separately, then I'd be more likely to use fetchmail's loop instead of calling it, one-shot, from a looping shellscript.
My Reminder program pops up a GUI message every few hours reminding me of upcoming appointments. This is the way I work best: I'm not a calendar kind of guy. I wrote the Reminder program some time in the 00's, in Python. It's an albatross I'm scared to work on. It's reminders.py
, with the task to be performed as arg1. The following is a list of those tasks:
The loop task has argument loop_display
. I had been running it from my UMENU software, where it had at least two long standing bugs:
Now that my personal runit is supervising it, so far it appears that the zombies are gone, and the multiple reminder windows happen much less frequently, and so far a maximum of one duplicate, not two duplicates like when it was run from UMENU. The following is the run
script for the reminder loop:
#!/usr/bin/env ksh cd /d/at/python/reminders exec ./reminders.py loop_display
This was already covered in the Supervising Pulseaudio With Your Specific User Runit. The only other thing I can say is this: For years I've tried unsuccessfully to get Pulseaudio to work on Void Linux. Once I made my daemon and supervised it with my personal runit, Pulseaudio worked consistently, productively and unsurprisingly.
So it appears that supervising my daemons from my personal runit sometimes solves problems and sometimes makes things less complex. What I've learned is that the next time I want to make a daemon, I'll put the looping part in a simple shellscript and do the single pass part in Python or C or whatever. Now that I have my personal runit, life is going to get much simpler.
As mentioned in an earlier section, when your daemon is a looping shellscript calling a 1 pass process to iterate once each cycle, it's sometimes important to make sure the daemon doesn't go down during the process, but only before or after it. For me, this is especially true of fetchmail
, because I don't want to take any chance of losing email.
Note:
My experimentation tells me that simply using sv down
allows the separate program that does the 1 pass process to finish. Probably somewhere the documentation says this too. But with fetchmail
I want to be absolutely certain. Hence the USR1 solution, even if theoretically it might not be needed.
When designing this method, the last thing I want to do is fiddle around with my running postgres
loop code to find how to do it. So instead, I fiddled around with my newtest
daemon directory and the shellscript it runs. The following Ascii diagram describes the new process, starting from newtest
's run
script:
run ==>mytest.sh ==>count2ten.sh
The run
script doesn't change a bit: It still runs mytest.sh
. Both mytest.sh
and count2ten.sh
are in directory /home/slitt
. count2ten.sh
writes a start message to the log (/tmp/junky.log
, then for "one" to "ten" logs the number, one per second, and finally logs a finish message. Under normal operation mytest.sh simply loops forever, with each iteration running count2ten.sh
and then sleeping 10 seconds.
When mytest.sh
receives a USR1 signal, it sets a variable from "0" to "1". After the return of count2ten.sh
and before the sleep, a function is called that checks that variable, and if that variable is "1", then it performs the following command:
/d/bats/svl down newtest
The following is the run
script:
#!/usr/bin/env ksh stop_when_safe=/bin/false on_usr1() { stop_when_safe=/bin/true } test_and_handle_safe_stop() { if $stop_when_safe; then echo "Stopping now!" >> /tmp/junky.log echo "" >> /tmp/junky.log echo "" >> /tmp/junky.log /d/bats/svl down newtest fi } trap on_usr1 USR1 cd /home/slitt || exit 1 while /bin/true; do echo -n "PID is " >> /tmp/junky.log echo $$ >> /tmp/junky.log ./count2ten.sh sleep 1 test_and_handle_safe_stop sleep 10 test_and_handle_safe_stop done
The trap on_usr1 USR1
line runs function on_usr1()
, and does nothing else, when the program receives a USR1 signal. Note that receiving a USR1 signal is asynchronous with respect to the loop. Function on_usr1()
changes variable $stop_when_safe from a false status to a true status, and nothing more. Function test_and_handle_safe_stop() checks variable $stop_when_safe, and if true, uses the sv down newtest
command to stop the daemon itself.
What this code has done is turned the asynchronous receipt of USR1 to a synchronous shutdown that can occur only 1 second after ./count2ten.sh
finishes, or immediately before it starts, thus preventing a stop while actual work is being done.
Note:
This also could have been done by creating a sentinel file. In that case, test_and_handle_safe_stop() would be used to test for that file's existence, and if it exists, delete it and down the daemon. This would have eliminated the trap
statement and the on_usr1()
function, but in my opinion using the signal is more Unixy and probably a better idea.
mytest.sh
cals count2ten.sh
, which serves as a simple standin for a program that really does something (like a single-run, foreground invocation of fetchmail
). The following is the straightforward code for count2ten.sh
:
#!/usr/bin/env ksh echo "" >> /tmp/junky.log echo "Begin count2ten.sh" >> /tmp/junky.log for num in one two three four five six seven eight nine ten; do echo $num >> /tmp/junky.log sleep 1 done echo "End count2ten.sh" >> /tmp/junky.log echo "" >> /tmp/junky.log
This setup works perfectly. I'm glad I did it before attempting the concept with fetchmail
, because it took me about an hour to get it to work perfectly. Armed with this experience, I'll soon do the same thing for fetchmail
.
That same function is called immediately after the sleep, so that if a USR1 was received during the sleep, the daemon goes down before running another iteration of
The sv program sends a USR1 when its arg1 is 1
.
You might wonder how I send a USR1 to the daemon. I simply use the following command:
svl 1 newtest
The arg1 of "1" sends a USR1.
I want to be absolutely certain that my downing the loop calling fetchmail
never stops the fetchmail
download midstream. I don't want to take my experimentation's word for it. I don't want to take the documentation's word for it. I don't want to take any expert's word for it. I don't even want to take Runit's author's word for it. I want certainty, even if I have to write 45 extra lines of shellscripting to gain that certainty. This section details how I gain that certainty.
The following is the code of /home/slitt/service_gui/run
:
#!/usr/bin/env ksh exec /d/at/fetchmail/fetchmail_loop
The run
script is unchanged from the simple setup showcased earlier in this document. The following is the code of the /d/at/fetchmail/fetchmail_loop
program, which changed quite a bit but is almost the same as the daemon program in the earlier Proof of Concept: Shutting Down With USR1 section:
#!/usr/bin/env ksh waittime=180 logfile=/d/at/fetchmail/logs/fetchmail.log stop_when_safe=/bin/false on_usr1() { stop_when_safe=/bin/true } test_and_handle_safe_stop() { if $stop_when_safe; then printf "Stopping now!\n\n" >> $logfile /d/bats/svl down fetchmail fi } trap on_usr1 USR1 cd /home/slitt || exit 1 while /bin/true; do echo "" >> "$logfile" echo "" >> "$logfile" echo "Starting next fetchmail run" >> "$logfile" date +%Y%m%d_%H%M%S >> "$logfile" date +New fetchmail run: %H:%M:%S >> "$logfile" echo -n "PID is " >> $logfile echo $$ >> $logfile fetchmail -f /home/slitt/mail/fm/.fetchmailrc 1>&2 >> "$logfile" sleep 1 count=$waittime echo "Fetchmail run finished, sleeping $waittime seconds" >> "$logfile" while (( count >= 0 )); do test_and_handle_safe_stop sleep 5 (( count -= 5)) done echo "End of sleep $waittime seconds" >> "$logfile" test_and_handle_safe_stop done
The preceding is basically the same as the daemon script (not the run script) from Proof of Concept: Shutting Down With USR1, except:
fetchmail
instead of a shellscript.test_and_handle_safe_stop()
Runit creates and handles log files very well, including limiting log file size, rotating logs, and deleting older rotated logs. It also prepends a very readable, precise and accurate timestamp to each log entry. Runit's logging facility simply logs everything the daemon sends to stdout, so when you make your own log you needn't write to a special log file. Runit does it all for you, and does it cleanly and in good taste. It does log rotation according to simple command line specifications you give it.
Before getting into the mechanics of making runit write log files for a daemon, it's important to discuss some alternatives of where the log files should go.
Obviously, don't place your log files on a temporary or ram disk partition, because you don't want them disappearing from time to time. For most systemd, this absolutely rules out the following trees: and
trees.
/tmp
/run
/dev
/sys
Note that /run
is specified to start empty with every reboot.
Your logs must reside on a partition big enough to handle them, and then some. If you want to use the /var
tree, which is pretty standard for log files (/var/log
), be sure it has enough room to handle logs for all your new daemons.
/home
tree, be sure that partition has enough space. The df -h
command is your friend.
You probably want your logs on the local machine rather than a remote-loaded partition from NFS or Samba or sshfs mount. Don't get me wrong, for your specific user runit log files would work on a remote mount because the remote mount is mounted before the user specific runit is started, but if the connection went down, so would your logs.
Danger!
The preceding paragraph concerned your user specific Runit only. For the system wide Runit, putting the logs on a mount to a remote system is absolutely inappropriate because logs should start early enough to capture everything; even events happening before the remote mount is completed.
If you believe that constant creating and deleting files on an SSD hastens the SSD's demise, then you don't want to put a rotating log file like created by Runit on an SSD or NVMe drive. Once all these requirements are taken care of, your choice of log locations is pretty much whatever you think is best.
Where's the best place for your daemon's log files? The answer is "it depends". It depends on whether you consider your log files to be data to be mixed with your other data. It depends on whether you want your log files backed up. It depends on which customs and standards you believe to be the most credible. It depends on your personal preferences, or your boss's, or your CIO's. There's no "one size fits all" answer.
There's an even more basic question that impinges on the "where do they go" question: Should your personal runit simply toss its log lines into the system wide logger, or should it build individual logs for each daemon? Each has benefits and drawbacks:
tail -f
them.I can't tell you anything about the system wide logger alternative because it's distro dependent. Also, I'm not going to use the system wide logger for this. Therefore, I won't say anything more about using Runit logging with system wide loggers.
An excellent location, certainly the most Unixy location, and the location creating the least resistance and criticism is:
/var/log/users/slitt
In the preceding, of course you'd change "slitt" to the name of the user running the personal runit. /var/log/users
should be owned by user root, group root, and permissioned chmod 755
. Within that directory, slitt
should be owned by slitt, and permissioned chmod 700
unless there's some reason you want people from one of slitt's groups or someone with no relationship at all to read the logs. Inside /var/log/users/slitt
should be subdirectories like pulse_slitt
, pager
, fetchmail
, etc. Each of those should be owned by slitt and chmod 700
unless there's a good reason to do otherwise.
You might wonder why I didn't suggest eliminating the users
level and just placing pager
etc. right in /var/log
. The answer is there just might already be a user called "pager", causing a name clash. By using "users", only if a user is named "users" will there be a problem, and if that happens, you could call the directory "users_whatever" (substitute what you want for "whatever"), so that there's no clash, and then just forbid somebody from having name "users_whatever" (or whatever).
If you view these logs as the user's data and want them backed up with the user's data, then they belong in the /home
tree. There are a million ways to do this: Let me present two...
First, you can do it the djb (Donald J Bernstein) way. Assuming the user to be "slitt" and the daemon directory to be /home/slitt/supervise_gui/pager
, djb would put the logs in /home/slitt/supervise_gui/pager/log/main
. This has the huge advantage that everything for the daemon you created for pager
is contained within the /home/slitt/supervise_gui/pager
tree, and it's really convenient that the run
file within the pager/log
directory can be written relatively, as follows:
#!/usr/bin/env ksh exec svlogd -tt ./main
Everything in a nice, tidy, encapsulated package.
Perhaps the preceding doesn't appeal to you. Truth be told, for some reason I can't put my finger on, I don't like it myself, although there's no rational reason why not. But if you want the logs as the user's data to be backed up with the user's data, another alternative is to put it under ~/rlogs
, as follows (example for the pager
daemon for user "slitt":
$HOME/rlogs/pager
There's a whole specification concerning where the Freedesktop organization thinks things should go, at https://specifications.freedesktop.org/basedir-spec/latest/. Different people assign different importance to the Freedesktop specs, ranging from people who pay no attention to them to people who take the Freedesktop specs more seriously than they take the POSIX customs. I fall in the former category, so I know almost nothing about Freedesktop, meaning that I'm the wrong guy to ask about Freedesktop specs and customs.
Based on the Freedesktop specs linked to in the preceding paragraph, it seems to me that those wanting to comply with Freedesktop would prefer to put their log files in a tree headed by $XDG_STATE_HOME
.
For my purposes, I don't want my log files mixed in with my data, so my log files will not go anywhere in the /home
tree. From my personal perspective, one of the big benefits of having a personal Runit is having everything seamlessly owned by user slitt
, so I won't be putting them anywhere in /var/log
or anywhere else where I'd need to do special steps to make them writeable by slitt
. Furthermore, I don't want my log files backed up, so I'll be putting them within my /scratch/rlogs
tree. /scratch
doesn't get backed up, and it's already owned by slitt
, and the mount partition for /scratch
is huge, so it can accommodate a lot of big log files.
Note:
If you don't like making the /scratch
directory directly off of the root directory, or if you're not allowed to create it, just substitute /home/slitt/scratch
for /scratch
, once again remembering to substitute the user's login name for "slitt".
So for my "newtest" daemon, the log files will reside in /scratch/rlogs/newtest
. If you're wondering why there isn't a username in the preceding directory, it's because in my use case nobody except user "slitt" uses a personal Runit, so the user is implied.
Important!
My situation is unusual, so my log locations are unusual. Your log locations will almost certainly be different from mine. There's no "right" place for log files. Just examine your situation and place them accordingly. Think about it for awhile before going on.
In this document's previous examples, the daemon has written directly to a file. Many daemons were designed to write directly to a file. Runit does all the work for you, by propelling the daemon's stdout to the log file. Therefore, your daemon should write nothing to stdout except what it wants logged. If you also want to capture the daemon's stderr in your log, that's possible too.
So now you know where you want to put your log files. This subsection shows you how to put the logs for user "slitt" invocation of daemon "newtest" into /scratch/rlogs/slitt/newtest
. This procedure is designed so that your existing "newtest" daemon, which is now running without a log, stays running as long as possible and has as brief a down period as possible.
Your first step is to make a copy of /home/slitt/mytest.sh
. Copy it to /tmp/mytest.sh
. Within that copy, you'll be adding three lines, which are italic and a different background and foreground color in the following code:
#!/usr/bin/env ksh echo "Starting the newtest daemon." stop_when_safe=/bin/false on_usr1() { stop_when_safe=/bin/true } test_and_handle_safe_stop() { if $stop_when_safe; then echo "Stopping now!" >> /tmp/junky.log echo "" >> /tmp/junky.log echo "" >> /tmp/junky.log echo "Stopping the newtest daemon." /d/bats/svl down newtest fi } trap on_usr1 USR1 cd /home/slitt || exit 1 while /bin/true; do echo -n "PID is " >> /tmp/junky.log echo $$ >> /tmp/junky.log echo "Starting ./count2ten.sh." ./count2ten.sh sleep 1 test_and_handle_safe_stop sleep 10 test_and_handle_safe_stop done
Instead of going to /tmp/junky.log
, these three new statements output to stdout, where they're picked up by Runit and sent to Runit's log file for this daemon. Please remember, this new file is called /tmp/mytest.sh
There is no need to change anything in count2ten.sh
, which continues to output to /tmp/junky.log
as always. Only the three new lines end up in the Runit log for the daemon.
Note:
If you don't like making the /scratch
directory directly off of the root directory, or if you're not allowed to create it, just substitute /home/slitt/scratch
for /scratch
, once again remembering to substitute the user's login name for "slitt".
The following is the recipe for setting up logging for the newtest
daemon:
mkdir -p /tmp/temp/log
to create a temporary directory for assembling the log directory. You can put it anywhere.cd /tmp/temp/log
to get into that directory.run
#!/usr/bin/env ksh exec svlogd -tt /scratch/rlogs/newtest/
chmod u+x run
This step is absolutely vital, because if you forget to do it, you'll have all sorts of nasty, time consuming debugging to do.svl down newtest
cd /home/slitt/service_gui
rm newtest
: This completely disconnects newtest
from your personal Runit, enabling you to do what you need to do without creating a confusing and time consuming state problem.mv /tmp/temp/log ~/service_all/newtest
: This installs a ready to go logging facility in your newtest
daemon.cp /tmp/mytest.sh /home/slitt
chmod u+x /home/slitt/mytest.sh
ln /home/slitt/service_all/newtest /home/slitt/service_gui/newtest
tail -f /tmp/junky.log
in order to test that the daemon is doing its job.tail -f /scratch/rlogs/newtest/current
to test that the log is being written to. The log should be written to about every ten seconds. Don't stop this tail -f
process.svl down newtest
to bring the daemon down. Verify that "/scratch/rlogs/newtest/current
.svl newtest
to bring the daemon back up./scratch/rlogs/newtest/current
.If both /tmp/junky.log
and /scratch/rlogs/newtest/current
were written to as specified, congratulations, you've just implemented Runit with logging. If not, go to the section on Troubleshooting, earlier in this document.
The simplicity of runit's multi-daemon supervisor system is breathtakingly beautiful. As shown earlier in this document, reasonably complete understanding of this system is gained from two simple diagrams, one for startup and one for control. A basic specific user setup uses only three programs: runsvdir
, runsv
, and sv
. Making things simpler is the fact that you can achieve the control functionalities of sv
primarily by echoing characters into supervise/control
.
In fact, conceptually speaking you could call sv
a front end to supervise/control
and supervise/status
. The sv
source code shows this to be an oversimplification, but conceptually it's a great starting point.
Another beauty of the runit multi-daemon supervisor is that it uses the Unix/POSIX functionalities to accomplish its magic. Unix symlinks. Unix cooperative file locking. Unix named pipes. Unix directory/file hierarchies. Half of runit was already written before line one of runit or its ancestor, daemontools, was written. Speaking of daemontools...
Daniel J Bernstein, otherwise known as djb, wrote the multi-daemon supervisor called daemontools in 2001, in order to provide a stable control mechanism for his server software such as djbdns and qmail. People, including me, used daemontools to supervise his other software (djbdns in my case), and noticed how much better it was than the the the existing daemon management facilities. So many us supervised other daemons with djbdns. djbdns could be run from all the init systems of the day (and this is still true today), so people began to substitute it for the ugly daemon management from those init systems.
A few years later several full init systems, with daemontools-inspired multi-daemon supervisors, were created. The two most widely used of those init systems were runit and s6. Both runit and s6 are simple and spectacular. runit is simpler, s6 gives the admin more control. You can't go wrong with either.
djb is a genius mathematician and security expert. I'm pretty sure he wrote daemontools as a quickie supervisor just for his own software, so instead of reinventing the wheel or incorporating all sorts of databases and other geegaws, he just quickly built it on top of Unix.
Every time I code anything, wise developers leapfrog each other telling me I shouldn't reinvent the wheel, instead I should use other peoples' code. djb did "don't reinvent the wheel" the right way, using Unix and the standard C libraries, and a directory tree for a database.
daemontools and runit are so conceptually simple I could have written them myself. My version would have had a text based supervise/status
, so it wouldn't have been as efficient, performant, featureful or bug free, but it would have served the purpose until people smarter than I made them better. I have a feeling half the people reading this now could have written them, if they were given a proper specification.
The Internet being the Internet, you'll hear critics say this document is inaccurate, too long, too complicated, too simple, not useful. Some will call it propaganda, or the ravings of a graybeard stuck in the past and afraid to learn new things. Others will take great offense that its recipes don't work exactly for their favorite distro, or that I broke the rules of the Linux File Hierarchy Standard (FHS) or the Freedesktop.Org Extended Desktop (XDG) standards. And of course there are the guys who don't like the document's layout and colors, calling it sooooo 1998!
I could easily refute every one of these assertions, but doing so would be a disservice to the vast majority of readers who came here to DIY, not to criticize.
And just so we're all on the same page, use of Specific User Runit has absolutely nothing to do with the init system you happen to be using.
If you find a genuine inaccuracy or an ambiguity in this document, please email me so I can fix it.
Although runit can be used as a full blown init system to replace sysinit, systemd, OpenRC and the various init systems used on BSD, this document uses only runit's daemon supervision capabilities and therefore is init system agnostic. This document is equally valid no matter what init system you're using.
The following two Mental Models (block diagrams) spell out the workings of the runit multi-daemon supervisor:
In other words, the runit multi-daemon supervisor is dead-bang simple and exquisitely beautiful in its simplicity.
Reading this document, it's certain that you noticed that I used a very odd set of directories for some things, especially my use of /scratch
, which is a blatant violation of the FHS. There's a reason for this: I didn't make this document out of the goodness of my heart. I made it as an as-built drawing of the system I actually use, in my very particular and unique situation, to run my home-grown daemons that should be run as user slitt. Although I suggest, for ease of learning and practice, the first time going through this document you use the locations I use. With one exception: The use of /scratch
right off the root is a bridge too far for many people, in which case you can use /home/slitt/scratch
.
Speaking of "slitt", please remember to substitute the specific user's username for every instance of "slitt" in this document.
Over half this document is a step by step recipe for making, testing and troubleshooting a specific-user runit. It looks long, but that's just because each step is a single step, so that the procedure becomes unambiguous.
In my experience, using a personal runit to supervise Pulseaudio, fetchmail, my reminder daemon and my pager daemon is that things go better.
If your daemon consists of a looping shellscript to repeatedly call a program and then sleep a specific amount of time, you don't want the server to go down in the middle of the called program. The way to run such a looping shellscript is for it to trap the USR1 signal and change a variable on receipt of the USR1. Then, during the sleep phase, keep testing that variable, and if the variable has been changed, use the svl
shellscript, which is a slight modification of the sv
program provided by runit, to down the looping shellscript. You send the USR1 signal to the looping shellscript with the following command:
svl 1 newtest
Obviously, substitute the actual daemon directory name for newtest
. But what if the called program hangs so the variable is never tested? In that case, you can take the daemon down hard with the following command:
svl down newtest
If that doesn't work, you can murder the daemon with the KILL signal, using the following command:
svl kill newtest
The preceding is just like a kill 9
, with all the certainty and all the mess.
Runit is very good at keeping log files for any daemon supervised by runit. This includes log rotation and deletion of the oldest log files. The location of the log files can be defined on a per-daemon basis.
Runit's daemon supervision is beautiful in its simplicity. A person with Unix, BSD or Linux familiarity can easily understand runit. This is something that can't be said for most other software.
Creating my own user specific runit for user slitt has improved my computer's reliability and workflow, and given me a way to supervise any daemon I might want to write. And because runit doesn't require its daemons to background themselves, writing daemon software to be supervised by runit is much easier and simpler than the insane double-forks a backgrounding daemon requires.
If you feel the need to optimize the performance of your Linux, Unix or BSD computer, I recommend creating your own personal runit.