Trouble shooters . Com^®, Linux Library Present:

The Trouble shooters . Com xargs Guide

Contents:

Introduction
Source Code Formatting In This Document
The Scope of This Document
Looking Inside the Black Box
Controlling Xargs Argument Mapping
Handling Different Numbers of Args On Each Line
Fast Processing
Ignoring Empty Strings
xargs With Blank Filenames
Fancier Commands Using the -I Option
The Unix Philosophy
Summary

Introduction

xargs is an adapter. Its function is very similar to an SVGA to HDMI adapter, a PS/2 to USB adapter, or a traveler's voltage adapter.

In xargs' case, it adapts from a command that outputs lines of text via its stdout, to a command whose input is received via command line arguments. An example of the former is ls, and an example of the latter is echo.

Note:

If you don't understand the meaning of stdin and stdout, read this Wikipedia article. Also read about Unix pipelines. If you still don't understand these concepts, ask your friends. You should not use xargs until you understand stdin, stdout, and pipelines.

You can't just pipe from ls to echo. The following shows what happens if you try:

slitt@mydesq2:~$ ls /etc/ssh | \
 echo

slitt@mydesq2:~$

The reason there's no output is the lines of text in stdout of ls never made it to the command line arguments of echo, and echo receives all its input from its command line arguments. The output of ls is not compatible with the input of echo. What you need is an adapter to translate from stdout to command line args, and that adapter is xargs. The following is what happens when you insert the xargs adapter between ls and echo:

slitt@mydesq2:~$ ls /etc/ssh | \ 
 xargs --max-lines=1 echo
moduli
ssh_config
ssh_config.org
sshd_config
ssh_host_dsa_key
ssh_host_dsa_key.pub
ssh_host_ecdsa_key
ssh_host_ecdsa_key.pub
ssh_host_key
ssh_host_key.pub
ssh_host_rsa_key
ssh_host_rsa_key.pub
slitt@mydesq2:~$

In the preceding, the --max-lines=1 option is necessary for correct mapping from stdout to command line arg, and will be explained several times within this document. Unless you have reason to do otherwise, always include --max-lines=1 in every xargs command.

Note:

If you want to do less typing, a synonym for --max-lines=1 is -l1. Do not put a space between the -l and the number: Doing so would cause the command to malfunction. Be sure you don't confuse the lower case "L" with the number "1".

For the sake of clarity, this document continues to use --max-lines rather than the -l synonym. In real life, you might choose the shorter synonym so as to do less typing.

To demonstrate the preceding doing semi-useful work, consider the following example, which prepends the word ssh_etc before every file listed:

ls /etc/ssh | \
  xargs --max-lines=1 echo ssh_etc

Try the preceding command (don't do it as root in case you get the command wrong), and see the result.

The following diagram illustrates the adapter idea:

In the preceding image, the brown triangle pointing from text "series of lines of text" to the interaction line from ls to xargs simply means that the interaction from ls to xargs is a series of lines of text.

Please reread the past few paragraphs until you understand what's going on. xargs is hard enough to use correctly if you understand what it is and what it does, but it's almost impossible if you have any confusions about its function.

Usually, your hope is that xargs maps lines from its stdin to the command being run as one line of stdin per command process. In other words, you usually hope it ends up executing the following:

mycommand one
mycommand two
mycommand three

Including --max-lines=1 makes the preceding happen. Without it, the commands run will be more like the following:

mycommand one two three

This document shows you how to get the exact command assignment you desire, via the --max-lines=1 or --max-args=1, and shows you when to use each of these options. This document tells you how to achieve what you want xargs to do.

Xargs isn't easy. I've always had trouble with xargs. I had trouble with xargs in the 20th century. I just had trouble with xargs today. Here are three major categories of trouble I find with xargs:

It's a black box
The mapping of the preceding process' stdout lines to the args of the next mycommmand line can seem random.
xargs occasionally seems to add an artifact final argument that's an empty string, probably causing a mysterious error from mycommand.

All these problems are addressed in this document.

Source Code Formatting In This Document

This document has lots of shell script source codes, and lots of commands and their results. Such source code and commands don't wrap, so they can get pretty long with complex commands and command sequences. Which presents a problem now that people are viewing my pages on everything from television sized monitors to mini-cell phones.

In order to accommodate the widest variety of both devices and visual acuities, some of the source code in this document contains backslash line continuation. In other words, instead of representing a command like this:

ls *.txt | xargs --max-lines=1 touch

Like this:

ls *.txt | \
  xargs --max-lines=1 touch

The beauty of separating with backslash continuation is it works just fine if you cut and paste it to the command line. Try it!

On some commands and source I felt it was clearer to just shrink everything, like the following:

ls *.txt | xargs --max-lines=1 touch

For those of us lucky enough to have 20/20 vision, this small print probably won't be a problem. The rest of us will need to do a couple Ctrl+Plus to magnify, and then a couple Ctrl+Minus to go back. It could also be copied and pasted to an application with a bigger font. But one way or another, regardless of visual acuity or device size, almost everybody can read the source code in this document.

If you have serious problems reading this document, especially if it requires you to scroll horizontally, please email me.

The Scope of This Document

The entire intent of this document is to reveal the capabilities and landmines of xargs, and only xargs. This document's examples includes find, ls, echo, and grep, but this document is not about those commands. In this documents, those commands are nothing but supporting actors to provide use cases for this document's one true subject, xargs.

So many of the cascaded command pipelines used as examples could have been written more concisely, efficiently and even simply. Several could have been written without using xargs at all. They're here for academic purposes, to showcase xargs. The find program's -exec capability can do almost anything xargs could do with the output of find.

But in real life, what's probably going to be feeding xargs is a simple shellscript you wrote, not find. Or perhaps a cascaded pipeline chain of your own simple shellscripts.

Here are some links to subjects referred to, but beyond the scope of, this document:

One more thing: Due to time constraints, this document describes only GNU xargs. If you're working on a different kind of Unix, and don't have GNU xargs, your commands might need to be a little different.

So strap up, and get ready to learn the power and landmines of xargs, assisted by academic use of other commands and technology. Read on...

Looking Inside the Black Box

xargs is a black box. You seldom know what it's going to do, and given the kinds of things it's used for, this can be very dangerous. Personally, I never put rm at the end of an xargs equipped command line. I'm not that brave: I'd rather pipe to a file, and manually convert that file to a bunch of deletions while making sure I'm not doing anything stupid.

But it doesn't take rm to make a command dangerous. Overwriting backup files, changing ownerships and permissions, adding text to or deleting text from certain files: These can all result in hours of troubleshooting or restore from backup if they get done to the wrong files. Before running a pipe and filter chain of commands powered by xargs, you need to be certain it's doing what you think it's doing. Using a find command as a source, and the echo command as the destination command, the following demonstrates how I test:

find . -type f \
 -regex ".*\.htm$" | \
 xargs --interactive \
 --max-lines=1 echo

If you run the preceding script from a directory tree with only a few *.htm files, you get a very good idea of what's going on.

Note:

In the previous example, and in some other examples in this document, I pipe the output of the find into xargs. In most cases, the better choice would be to use the find command's -exec option to invoke a command on each line. I just piped find into exec as an easy to understand demonstration of xargs in action.

Once the echo command seems to work right, replace echo with the real command (perhaps chmod or chown or some program that changes some text in each named file). Now, still using it interactively, see the actual commands that would be performed. Answer "n" each time it asks if you really want to do it -- you just want to know what will be done when you remove --interactive. Once the commands look good, you're not looking at a black box anymore, you're looking at the real deal.

DANGER!

If there's any chance at all that spaces or random punctuation will appear in the stdout of the program feeding into xargs, be sure to test with space and random punctuation. Be particularly careful of filenames with space and random punctuation.

Controlling Xargs Argument Mapping

This section demonstrates how to control the mapping of lines coming into the stdin of xargs, to command line arguments of the process being run. Usually you want one process run on each line from stdin, but you could have one process run on every two lines, with the odd line being arg1 and the even line being arg2. If there are spaces in each incoming line, you can map words to arguments. All this will be explained in this section.

Create the following file, calling it zprovide.sh:

#!/bin/sh
echo one
echo two
echo three
echo four
echo five
echo six

Now perform the following command:

slitt@mydesq2:~$ ./zprovide.sh | \
  xargs echo
one two three four five six
slitt@mydesq2:~$

The six items on one line are probably not what you want. In fact, if you add the --interactive option to xargs, you'll see that in fact the echo command runs only once.

In fact, what you probably want is for the echo command to run once for each line of ./zprovide.sh, which would look like the following:

slitt@mydesq2:~$ ./zprovide.sh | \
  xargs --max-lines=1  echo
one
two
three
four
five
six
slitt@mydesq2:~$

But, perhaps, you want the echo command to have two arguments, each of which is provided by one line of ./zprovide.sh output. In that case you'd do the following:

slitt@mydesq2:~$ ./zprovide.sh | \
  xargs --max-lines=2  echo
one two
three four
five six
slitt@mydesq2:~$

Three and four args per echo looks like the following:

slitt@mydesq2:~$ ./zprovide.sh | \
  xargs --max-lines=3  echo
one two three
four five six
slitt@mydesq2:~$ ./zprovide.sh | \
  xargs --max-lines=4  echo
one two three four
five six
slitt@mydesq2:~$

But what if the output of ./zprovide.sh has more than one word? Make the following zprovide2.sh:

#!/bin/sh
echo one 1
echo two 2
echo three 3
echo four 4
echo five 5
echo six 6

Notice how the preceding plays out with different arguments for xargs' --max-lines and --max-args options:

slitt@mydesq2:~$ ./zprovide2.sh | \
  xargs --max-lines=1  echo
one 1
two 2
three 3
four 4
five 5
six 6
slitt@mydesq2:~$ ./zprovide2.sh | \
  xargs --max-args=1 echo
one
1
two
2
three
3
four
4
five
5
six
6
slitt@mydesq2:~$ ./zprovide2.sh | \
  xargs --max-args=2  echo
one 1
two 2
three 3
four 4
five 5
six 6
slitt@mydesq2:~$ ./zprovide2.sh | \
 xargs --max-args=3  echo
one 1 two
2 three 3
four 4 five
5 six 6

Handling Different Numbers of Args On Each Line

What if the input process' stdout is meant to have lines with different numbers of space delimited arguments? Take the following zprovide3.sh, for instance:

#!/bin/sh
echo one
echo two a 
echo three b c
echo four d e f
echo five g h i j
echo six k l m n o

If the intent is to operate on every word on each line, one line at a time, what you want is the --max-lines=1 option, as follows:

slitt@mydesq2:~$ ./zprovide3.sh | \
  xargs --max-lines=1 echo
one
two a
three b c
four d e f
five g h i j
six k l m n o
slitt@mydesq2:~$

If what you're trying to do is spawn a process for each word on each line (I don't know why you'd do this), your command would look like the following:

./zprovide3.sh | xargs --max-args=1 echo

In the preceding, notice I changed --max-lines to --max-args to achieve one process per word. I didn't show the output, because it would have been very long. You should experiment with --max-lines and --max-args to get a feel for exactly how they work.

Fast Processing

This section demonstrates how you can run certain xargs equipped commands substantially faster by running its spawned processes in parallel.

By default, xargs works on one input process stdout line until the called for process finishes, and then goes on to the next line. If there are thousands of lines, this can be very slow. In certain circumstances, you can speed things up dramatically (perhaps an order of magnitude or two) by running the processes spawned by xargs simultaneously.

Consider the following command:

find /etc -type f | \
 xargs --max-lines=1 \
 -r --max-procs=20 \
 ./do1.sh

Note the --max-procs=20 option. This means that instead of waiting for ./do1.sh to finish its current line before starting on the next line of stdout from the incoming process, xargs will start the ./do1.sh process in the background, go on to the next line, start that line's ./do1.sh process, up until there are 20 such processes. Then, as one dies, xargs starts another one. So the way I wrote the preceding command, there are always 20 instances of ./do1.sh running.

If the bottleneck on ./do1.sh is CPU, you gain performance only to the extent that you match your --max-procs value to the number of useable CPU cores in your computer. Beyond that, administration of extra processes slows you down.

But a lot of times, the process being run is not CPU bound. Sometimes its bottleneck is the disk. Sometimes it's waiting for input from a socket that might take a while to come. Sometime the network to which that socket attaches is slow. Sometimes the socket is waiting for user input. Under these circumstances, you'll gain tremendous speed by invoking numerous processes.

What number should you put on --max-procs when the process being repeated isn't CPU bound? This you probably need to experiment. I've heard anecdotes of BSD running 50,000 simultaneous httpd processes, obviously on a machine meant for little else than serving web pages. On the other end, I can't imagine any situation where running 20 processes would cause any harm. So you'd need to find a number between 20 and 50,000, and do so experimentally. What I'd like to point out, though, is even if you went with 20 on a process that was socket-bound, having 20 processes would speed you up by almost 20 times. Just that is a huge improvement.

--max-procs Risks and Contra-indications

--max-procs isn't for everyone, nor every situation. Sometimes it's better to not specify the --max-procs argument, letting it default to 1.

Never use --max-procs if you care what order the lines get processed in. Multiple processes mean whatever runs faster happens first. If you need the lines to be processed in the order they go into xargs' stdin, don't use --max-procs

--max-procs makes what's happening hard to understand, so you should always debug your xargs equipped pipeline without it, and only after everything's debugged should you address whether to use it.

If you specify --max-procs=0, that gives xargs permission to create new processes without limit. This is almost always a bad idea. Specify a number. Not specifying a number is kind of like soldering a wire across a circuit breaker.

Ignoring Empty Strings

Occasionally you might find a situation where xargs handles an empty string at the end of its input process' stdout. When that happens, just use the -r option to xargs. This option, which is also called --no-run-if-empty, doesn't run the process on an empty string argument. This is usually a good idea, unless you actually plan on processing empty lines or arguments.

xargs With Blank Filenames

There's a special place in hell for those people who name a file "Jimmy Jone's report". That would be a firing offense at Troubleshooters.Com, where I control everything. But at larger organizations, admins and developers must code for that possibility, because no matter how stringently the company enforces filename policy (if they do at all), filenames like that will sneak through. New employees, documents from other companies, whatever. xargs has at least two ways of handling such filenames:

-d"\n" declares the delimeter to be newline, thereby transferring the whole of the line as one argument, even with spaces and punctuation.
-0 declares the delimeter to be the NULL character, '\0', ASCII 0. This can even handle lines containing newlines, but the command feeding in must put a NULL character at the end of each line. The find command does this if you use its -print0 option.

The following is an actual application of the delimiter method:

slitt@mydesq2:~/test$ ls | \
  xargs --max-lines=1 -d"\n" file
100.txt: ASCII text
101.txt: ASCII text
102.txt: ASCII text
103.txt: ASCII text
1 11.txt: ASCII text
Jimmy Jones's report: ASCII text
quo"tes: ASCII text
semi;colons: ASCII text
sin'gles: ASCII text
spa ces: ASCII text
ta bs: ASCII text
slitt@mydesq2:~/test/$

Ack nowl edge ment:

Thanks to Wayne Walker for the program to create most of these bizarre program names. His program's at https://gist.github.com/wwalker/0aaca2ae5eb5cadf01a6.

If you're using -0 and the line source doesn't exchange Nulls for newlines, you can do it yourself, as follows:

slitt@mydesq2:~/test$ ls | \
  tr "\n" "\0" | \
  xargs --max-lines=1 -0 file
100.txt: ASCII text
101.txt: ASCII text
102.txt: ASCII text
103.txt: ASCII text
1 11.txt: ASCII text
Jimmy Jones's report: ASCII text
quo"tes: ASCII text
semi;colons: ASCII text
sin'gles: ASCII text
spa ces: ASCII text
ta bs: ASCII text
slitt@mydesq2:~/test$

Fancier Commands Using the -I Option

Sometimes you need to run a fancier command on each line than a simple chmod or whatever. My favorite way of getting fancy is to write my own shellscript for xargs to run on each of its stdin lines. But there's also its -I option, which enables you to declare one or more replacement strings anywhere in the command xarg is running, and that string(s) is replaced by the stdin line when xargs runs the command.

In the following very contrived example, lines 1, 2 and 3 are fed to xargs, which uses the reverse polish notation calculator dc to enable function (x+1)*x on each line.

for i in 1 2 3; \
 do echo $i; done | \
 xargs -I _val_ dc \
 -e "_val_ 1 +  _val_  * p"
2
6
12
slitt@mydesq2:~$

The preceding example uses two replacement strings, not placed next to each other. This requirement comes up sometimes, and when it does, your choice is to code up a little shellscript for xargs to run, or to use the -I option.

If you notice, the previous example has no --max-lines=1 option. This is because -I automatically sets --max-lines=1, or in man page speak, -I implies --max-lines=1.

The Unix Philosophy

This being written in July 2015, there's been a lot of recent discussion of the Unix Philosophy. What is it? Is it a good thing, or is it an obsolete relic of a distant, geeky past?

Definitions of the Unix Philosophy abound, you can easily find them with an Internet search. The most used one sentence definition is "Do one thing and do it well." To me, that's an excellent summary of a major facet of the Unix Philosophy, or maybe a an excellent summary of a litmus test for the Unix Philosophy, but it's just a property or litmus test. To me, underlying the Unix Philosophy is a striving to give to a fairly smart person the option of quickly implementing an entirely new behavior for his computer. For instance, now that Google upranks "mobile-friendly" pages and downranks pages that don't pass their "mobile-friendly" test, I need to start converting all 800 pages of Troubleshooters.Com to mobile-friendliness. What can I do? Should I make a year's project of it by adding the necessary two lines in an editor? Or should I have a few hundred hours to write a beautifully polished GUI application enabling me to point and click my way through all this work?

Because I'm a fairly smart person and I'm aware of the Unix Philosophy, I made a third choice: I bolted together the tried and true tools that Unix (via Linux) gives me, in order to make programs to find unconverted web pages and convert them. My programs aren't pretty. They aren't polished. They aren't safe in the hands of the proverbial "dumb user". They don't need to be: They're being used by a fairly smart person prioritizing getting the job done over "pretty." The Unix Philosophy isn't for everyone, but in the hands of the right person (and there are millions of us "right people"), it's pure productivity power.

Before I list a few of the many tools Unix gives you, let me define the word filter in computing terms. A filter is a program that takes its input from its stdin, adds, changes and/or deletes lines from the stdin and sends them to stdout. You can cascade filters, connecting them output to input, with the pipe symbol (|). The following is an example:

find . | \
  grep -v "\.png$" | \
  grep -v "\.class$" | \
  vim -

In the preceding, grep is used as a filter twice: Once to remove all lines ending with .png, and the second to remove all lines ending in .class. The result gets sent to vim for observation. Once again, filters are used to cascade, output to input, processes that take their input from stdin and put their output on stdout. Note that the very first process in the cascade needn't take data from stdin find has no stdin input, and the last process in the cascade needn't output to stdout (vim normally doesn't output to stdout).

Note:

The example just discussed could have been done with
find . | grep -E -v '\.png$|\.class$' | vim -, but the purpose of the example was to exhibit a series of several pipes.

The following are some of the tools you can bolt together to create your own custom processes:

ls lists files in a directory. It has no input via stdin, its configuration comes from its command line arguments, and its output is to stdout.
find lists files in an entire directory tree. It can make all sorts of tests for inclusion and exclusion. It has no input via stdin, its configuration comes from its command line arguments, and its output is to stdout.
less is a viewer that shows, and allows you to scroll through, everything in its stdin. Its input is stdin, as normally used it has no output.
cat sends one or more files to its stdout. The files sent are specified as command line arguments. It typically has no input, its configuration is on its command line args, and its output is to stdout.
grep uses regular expressions to pick and choose lines of its stdin to write to its stdout. Configuration (its selection criteria) is from command line arguments.
tr changes all instances of a character on its stdin to another character it writes to stdout, with the character specified by its command line arguments.
sed is a one pass text modification system that can make extensive changes to a series of text lines. It receives input from stdin, writes the changed text to stdout, and receives instructions for that change via command line arguments.
awk is an even more powerful text modification system, completely capable of break logic, arithmetic, and other things programming languages do. It's not as fast as sed, so use it only when sed can't do the job. awk can be used many ways, including as a filter taking its input on stdin and putting its output on stdout. In all but the most trivial situations, it usually receives its instructions via an actual AWK program named on the command line. awk can be used to add headers and footers to a series of lines such that when sorted, the headers and footers make data processing a straightforward, one pass algorithm.
sort sorts a file. When used as a filter, it takes its input from stdin, writes the sorted version of that input on stdout, and receives its instructions via command line arguments. When used in conjunction with awk, sort a series of awk and sort processes can be used to perform a complex task that would otherwise have required significant design and programming. The sort program uses significant computer resources, so use sparingly on processes likely to get done many times simultaneously, or in situations where delay is a show-stopper. According to my experiments, sort takes about 25 seconds for 650,000 lines, 1.5 seconds for 45,000 lines, and 1/20 of a second for 3000 lines. Compare these times to how much sort can make your design easier, then make your choice. If nothing else, sort can be excellent for prototyping until you implement a program with more complex in-memory logic.
Python, Perl, Lua, and Ruby are full fledged languages with which you can make your own filters and other logic. Use these when awk isn't powerful enough, or when the time or resources required by sort would be prohibitive.
cut carves a line into specific "fields", and lists a subset of those fields. Input on stdin, output on stdout, behavior on its command line arguments.
tput and dialog are tools to add a crude nCurses user interface to filter cascades and batch processes of all kinds. zenity can throw up forms, calendars, password boxes, and lists so the user can interact with data possessed by the shellscript.
And last but not least, xargs is an adapter that takes lines on its stdin, and applies them as command line arguments to the program named in its own command line arguments.

The Unix Philosophy preserves the ability Ken Thompson and his successors gave you to mock up new computer functionalities with lightning speed when using Unix. The xargs program is a necessary part of the Unix Philosophy.

Summary

xargs is an adapter to adapt a program or command that outputs lines of text through its stdout, to a program whose input is received via its command line arguments. xargs is a program with a lot of options, which sometimes leads to confusion unless you have the knowledge contained in this document. The three most common problems with xargs are:

It's a black box
The mapping of the preceding process' stdout lines to the args of the next mycommmand line can seem random.
xargs occasionally seems to add an artifact final argument that's an empty string, probably causing a mysterious error from mycommand.

Problem #1 is solved mainly with the --interactive option and by, at first, having xargs run a simple echo command.

Problem #2 is usually solved with the --max-lines=1 command line option. Occasionally the problem at hand might call for you to use two lines for one process that takes two arguments, so that would require --max-lines=2. Obviously this could be extrapolated to higher numbers. Also, in some cases the situation might require you to count the number of words on a line rather than lines themselves. In those cases you'd use the --max-args=1 option, possibly changing the number to a higher one.

Problem 3 occurs occasionally, and is solved with the -r command line option.

For performance purposes, it's often advantageous for xargs to run several instances of the command specified as its argument. This is accomplished by the --max-procs=20 command line option, where you change 20 to whatever you believe to be the best maximum number of such processes. I'd highly advise against --max-procs=0, because this would create processes without limit, and no computer is limitless. This would be like jumpering a circuit breaker with a large diameter wire: it never trips, but the house can burn down, or in this case, the computer can crash. Remember, the --max-procs option is most advantageous when the process being run multiply is bottlenecked by something other than the CPU. If the process being run is CPU bound, there's no advantage running more instances of that process than you have cores in the computer, and if it uses all your cores, other processes are going to be very slow. -max-procs=20 (or higher) is best used when the process bottlenecked by a socket on the network that's waiting for responses.

xargs is an integral adapter used in the plumbing together of basic Unix commands as well as custom scripts you write, so it's an integral part of the Unix Philosophy and part of the way you write quick programs and prototypes.

[ Training | Troubleshooters.Com | Email Steve Litt ]