Trouble shooters . Com®, Linux Library Present:
The Trouble shooters . Com xargs Guide
Copyright © 2015 by Steve Litt
See the Troubleshooters.Com Bookstore.
Contents:
xargs is an adapter. Its function is very similar to an SVGA to HDMI adapter, a PS/2 to USB adapter, or a traveler's voltage adapter.
In xargs' case, it adapts from a command that outputs lines of text via its stdout, to a command whose input is received via command line arguments. An example of the former is ls, and an example of the latter is echo.
Note:
If you don't understand the meaning of stdin and stdout, read this Wikipedia article. Also read about Unix pipelines. If you still don't understand these concepts, ask your friends. You should not use xargs until you understand stdin, stdout, and pipelines.
You can't just pipe from ls to echo. The following shows what happens if you try:
slitt@mydesq2:~$ ls /etc/ssh | \ echo slitt@mydesq2:~$
The reason there's no output is the lines of text in stdout of ls never made it to the command line arguments of echo, and echo receives all its input from its command line arguments. The output of ls is not compatible with the input of echo. What you need is an adapter to translate from stdout to command line args, and that adapter is xargs. The following is what happens when you insert the xargs adapter between ls and echo:
slitt@mydesq2:~$ ls /etc/ssh | \ xargs --max-lines=1 echo moduli ssh_config ssh_config.org sshd_config ssh_host_dsa_key ssh_host_dsa_key.pub ssh_host_ecdsa_key ssh_host_ecdsa_key.pub ssh_host_key ssh_host_key.pub ssh_host_rsa_key ssh_host_rsa_key.pub slitt@mydesq2:~$
In the preceding, the --max-lines=1 option is necessary for correct mapping from stdout to command line arg, and will be explained several times within this document. Unless you have reason to do otherwise, always include --max-lines=1 in every xargs command.
Note:
If you want to do less typing, a synonym for --max-lines=1 is -l1. Do not put a space between the -l and the number: Doing so would cause the command to malfunction. Be sure you don't confuse the lower case "L" with the number "1".
For the sake of clarity, this document continues to use --max-lines rather than the -l synonym. In real life, you might choose the shorter synonym so as to do less typing.
To demonstrate the preceding doing semi-useful work, consider the following example, which prepends the word ssh_etc before every file listed:
ls /etc/ssh | \ xargs --max-lines=1 echo ssh_etc
Try the preceding command (don't do it as root in case you get the command wrong), and see the result.
The following diagram illustrates the adapter idea:
In the preceding image, the brown triangle pointing from text "series of lines of text" to the interaction line from ls to xargs simply means that the interaction from ls to xargs is a series of lines of text.
Please reread the past few paragraphs until you understand what's going on. xargs is hard enough to use correctly if you understand what it is and what it does, but it's almost impossible if you have any confusions about its function.
Usually, your hope is that xargs maps lines from its stdin to the command being run as one line of stdin per command process. In other words, you usually hope it ends up executing the following:
mycommand one mycommand two mycommand three
Including --max-lines=1 makes the preceding happen. Without it, the commands run will be more like the following:
mycommand one two three
This document shows you how to get the exact command assignment you desire, via the --max-lines=1 or --max-args=1, and shows you when to use each of these options. This document tells you how to achieve what you want xargs to do.
Xargs isn't easy. I've always had trouble with xargs. I had trouble with xargs in the 20th century. I just had trouble with xargs today. Here are three major categories of trouble I find with xargs:
All these problems are addressed in this document.
This document has lots of shell script source codes, and lots of commands and their results. Such source code and commands don't wrap, so they can get pretty long with complex commands and command sequences. Which presents a problem now that people are viewing my pages on everything from television sized monitors to mini-cell phones.
In order to accommodate the widest variety of both devices and visual acuities, some of the source code in this document contains backslash line continuation. In other words, instead of representing a command like this:
ls *.txt | xargs --max-lines=1 touch
Like this:
ls *.txt | \ xargs --max-lines=1 touch
The beauty of separating with backslash continuation is it works just fine if you cut and paste it to the command line. Try it!
On some commands and source I felt it was clearer to just shrink everything, like the following:
ls *.txt | xargs --max-lines=1 touch
For those of us lucky enough to have 20/20 vision, this small print probably won't be a problem. The rest of us will need to do a couple Ctrl+Plus to magnify, and then a couple Ctrl+Minus to go back. It could also be copied and pasted to an application with a bigger font. But one way or another, regardless of visual acuity or device size, almost everybody can read the source code in this document.
If you have serious problems reading this document, especially if it requires you to scroll horizontally, please email me.
The entire intent of this document is to reveal the capabilities and landmines of xargs, and only xargs. This document's examples includes find, ls, echo, and grep, but this document is not about those commands. In this documents, those commands are nothing but supporting actors to provide use cases for this document's one true subject, xargs.
So many of the cascaded command pipelines used as examples could have been written more concisely, efficiently and even simply. Several could have been written without using xargs at all. They're here for academic purposes, to showcase xargs. The find program's -exec capability can do almost anything xargs could do with the output of find.
But in real life, what's probably going to be feeding xargs is a simple shellscript you wrote, not find. Or perhaps a cascaded pipeline chain of your own simple shellscripts.
Here are some links to subjects referred to, but beyond the scope of, this document:
One more thing: Due to time constraints, this document describes only GNU xargs. If you're working on a different kind of Unix, and don't have GNU xargs, your commands might need to be a little different.
So strap up, and get ready to learn the power and landmines of xargs, assisted by academic use of other commands and technology. Read on...
xargs is a black box. You seldom know what it's going to do, and given the kinds of things it's used for, this can be very dangerous. Personally, I never put rm at the end of an xargs equipped command line. I'm not that brave: I'd rather pipe to a file, and manually convert that file to a bunch of deletions while making sure I'm not doing anything stupid.
But it doesn't take rm to make a command dangerous. Overwriting backup files, changing ownerships and permissions, adding text to or deleting text from certain files: These can all result in hours of troubleshooting or restore from backup if they get done to the wrong files. Before running a pipe and filter chain of commands powered by xargs, you need to be certain it's doing what you think it's doing. Using a find command as a source, and the echo command as the destination command, the following demonstrates how I test:
find . -type f \ -regex ".*\.htm$" | \ xargs --interactive \ --max-lines=1 echo
If you run the preceding script from a directory tree with only a few *.htm files, you get a very good idea of what's going on.
Note:
In the previous example, and in some other examples in this document, I pipe the output of the find into xargs. In most cases, the better choice would be to use the find command's -exec option to invoke a command on each line. I just piped find into exec as an easy to understand demonstration of xargs in action.
Once the echo command seems to work right, replace echo with the real command (perhaps chmod or chown or some program that changes some text in each named file). Now, still using it interactively, see the actual commands that would be performed. Answer "n" each time it asks if you really want to do it -- you just want to know what will be done when you remove --interactive. Once the commands look good, you're not looking at a black box anymore, you're looking at the real deal.
DANGER!
If there's any chance at all that spaces or random punctuation will appear in the stdout of the program feeding into xargs, be sure to test with space and random punctuation. Be particularly careful of filenames with space and random punctuation.
This section demonstrates how to control the mapping of lines coming into the stdin of xargs, to command line arguments of the process being run. Usually you want one process run on each line from stdin, but you could have one process run on every two lines, with the odd line being arg1 and the even line being arg2. If there are spaces in each incoming line, you can map words to arguments. All this will be explained in this section.
Create the following file, calling it zprovide.sh:
#!/bin/sh echo one echo two echo three echo four echo five echo six
Now perform the following command:
slitt@mydesq2:~$ ./zprovide.sh | \ xargs echo one two three four five six slitt@mydesq2:~$
The six items on one line are probably not what you want. In fact, if you add the --interactive option to xargs, you'll see that in fact the echo command runs only once.
In fact, what you probably want is for the echo command to run once for each line of ./zprovide.sh, which would look like the following:
slitt@mydesq2:~$ ./zprovide.sh | \ xargs --max-lines=1 echo one two three four five six slitt@mydesq2:~$
But, perhaps, you want the echo command to have two arguments, each of which is provided by one line of ./zprovide.sh output. In that case you'd do the following:
slitt@mydesq2:~$ ./zprovide.sh | \ xargs --max-lines=2 echo one two three four five six slitt@mydesq2:~$
Three and four args per echo looks like the following:
slitt@mydesq2:~$ ./zprovide.sh | \ xargs --max-lines=3 echo one two three four five six slitt@mydesq2:~$ ./zprovide.sh | \ xargs --max-lines=4 echo one two three four five six slitt@mydesq2:~$
But what if the output of ./zprovide.sh has more than one word? Make the following zprovide2.sh:
#!/bin/sh echo one 1 echo two 2 echo three 3 echo four 4 echo five 5 echo six 6
Notice how the preceding plays out with different arguments for xargs' --max-lines and --max-args options:
slitt@mydesq2:~$ ./zprovide2.sh | \ xargs --max-lines=1 echo one 1 two 2 three 3 four 4 five 5 six 6 slitt@mydesq2:~$ ./zprovide2.sh | \ xargs --max-args=1 echo one 1 two 2 three 3 four 4 five 5 six 6 slitt@mydesq2:~$ ./zprovide2.sh | \ xargs --max-args=2 echo one 1 two 2 three 3 four 4 five 5 six 6 slitt@mydesq2:~$ ./zprovide2.sh | \ xargs --max-args=3 echo one 1 two 2 three 3 four 4 five 5 six 6
What if the input process' stdout is meant to have lines with different numbers of space delimited arguments? Take the following zprovide3.sh, for instance:
#!/bin/sh echo one echo two a echo three b c echo four d e f echo five g h i j echo six k l m n o
If the intent is to operate on every word on each line, one line at a time, what you want is the --max-lines=1 option, as follows:
slitt@mydesq2:~$ ./zprovide3.sh | \ xargs --max-lines=1 echo one two a three b c four d e f five g h i j six k l m n o slitt@mydesq2:~$
If what you're trying to do is spawn a process for each word on each line (I don't know why you'd do this), your command would look like the following:
./zprovide3.sh | xargs --max-args=1 echo
In the preceding, notice I changed --max-lines to --max-args to achieve one process per word. I didn't show the output, because it would have been very long. You should experiment with --max-lines and --max-args to get a feel for exactly how they work.
This section demonstrates how you can run certain xargs equipped commands substantially faster by running its spawned processes in parallel.
By default, xargs works on one input process stdout line until the called for process finishes, and then goes on to the next line. If there are thousands of lines, this can be very slow. In certain circumstances, you can speed things up dramatically (perhaps an order of magnitude or two) by running the processes spawned by xargs simultaneously.
Consider the following command:
find /etc -type f | \ xargs --max-lines=1 \ -r --max-procs=20 \ ./do1.sh
Note the --max-procs=20 option. This means that instead of waiting for ./do1.sh to finish its current line before starting on the next line of stdout from the incoming process, xargs will start the ./do1.sh process in the background, go on to the next line, start that line's ./do1.sh process, up until there are 20 such processes. Then, as one dies, xargs starts another one. So the way I wrote the preceding command, there are always 20 instances of ./do1.sh running.
If the bottleneck on ./do1.sh is CPU, you gain performance only to the extent that you match your --max-procs value to the number of useable CPU cores in your computer. Beyond that, administration of extra processes slows you down.
But a lot of times, the process being run is not CPU bound. Sometimes its bottleneck is the disk. Sometimes it's waiting for input from a socket that might take a while to come. Sometime the network to which that socket attaches is slow. Sometimes the socket is waiting for user input. Under these circumstances, you'll gain tremendous speed by invoking numerous processes.
What number should you put on --max-procs when the process being repeated isn't CPU bound? This you probably need to experiment. I've heard anecdotes of BSD running 50,000 simultaneous httpd processes, obviously on a machine meant for little else than serving web pages. On the other end, I can't imagine any situation where running 20 processes would cause any harm. So you'd need to find a number between 20 and 50,000, and do so experimentally. What I'd like to point out, though, is even if you went with 20 on a process that was socket-bound, having 20 processes would speed you up by almost 20 times. Just that is a huge improvement.
--max-procs isn't for everyone, nor every situation. Sometimes it's better to not specify the --max-procs argument, letting it default to 1.
Never use --max-procs if you care what order the lines get processed in. Multiple processes mean whatever runs faster happens first. If you need the lines to be processed in the order they go into xargs' stdin, don't use --max-procs
.--max-procs makes what's happening hard to understand, so you should always debug your xargs equipped pipeline without it, and only after everything's debugged should you address whether to use it.
If you specify --max-procs=0, that gives xargs permission to create new processes without limit. This is almost always a bad idea. Specify a number. Not specifying a number is kind of like soldering a wire across a circuit breaker.
Occasionally you might find a situation where xargs handles an empty string at the end of its input process' stdout. When that happens, just use the -r option to xargs. This option, which is also called --no-run-if-empty, doesn't run the process on an empty string argument. This is usually a good idea, unless you actually plan on processing empty lines or arguments.
There's a special place in hell for those people who name a file "Jimmy Jone's report". That would be a firing offense at Troubleshooters.Com, where I control everything. But at larger organizations, admins and developers must code for that possibility, because no matter how stringently the company enforces filename policy (if they do at all), filenames like that will sneak through. New employees, documents from other companies, whatever. xargs has at least two ways of handling such filenames:
The following is an actual application of the delimiter method:
slitt@mydesq2:~/test$ ls | \ xargs --max-lines=1 -d"\n" file 100.txt: ASCII text 101.txt: ASCII text 102.txt: ASCII text 103.txt: ASCII text 1 11.txt: ASCII text Jimmy Jones's report: ASCII text quo"tes: ASCII text semi;colons: ASCII text sin'gles: ASCII text spa ces: ASCII text ta bs: ASCII text slitt@mydesq2:~/test/$
Ack nowl edge ment:
Thanks to Wayne Walker for the program to create most of these bizarre program names. His program's at https://gist.github.com/wwalker/0aaca2ae5eb5cadf01a6.
If you're using -0 and the line source doesn't exchange Nulls for newlines, you can do it yourself, as follows:
slitt@mydesq2:~/test$ ls | \ tr "\n" "\0" | \ xargs --max-lines=1 -0 file 100.txt: ASCII text 101.txt: ASCII text 102.txt: ASCII text 103.txt: ASCII text 1 11.txt: ASCII text Jimmy Jones's report: ASCII text quo"tes: ASCII text semi;colons: ASCII text sin'gles: ASCII text spa ces: ASCII text ta bs: ASCII text slitt@mydesq2:~/test$
Sometimes you need to run a fancier command on each line than a simple chmod or whatever. My favorite way of getting fancy is to write my own shellscript for xargs to run on each of its stdin lines. But there's also its -I option, which enables you to declare one or more replacement strings anywhere in the command xarg is running, and that string(s) is replaced by the stdin line when xargs runs the command.
In the following very contrived example, lines 1, 2 and 3 are fed to xargs, which uses the reverse polish notation calculator dc to enable function (x+1)*x on each line.
for i in 1 2 3; \ do echo $i; done | \ xargs -I _val_ dc \ -e "_val_ 1 + _val_ * p" 2 6 12 slitt@mydesq2:~$
The preceding example uses two replacement strings, not placed next to each other. This requirement comes up sometimes, and when it does, your choice is to code up a little shellscript for xargs to run, or to use the -I option.
If you notice, the previous example has no --max-lines=1 option. This is because -I automatically sets --max-lines=1, or in man page speak, -I implies --max-lines=1.
This being written in July 2015, there's been a lot of recent discussion of the Unix Philosophy. What is it? Is it a good thing, or is it an obsolete relic of a distant, geeky past?
Definitions of the Unix Philosophy abound, you can easily find them with an Internet search. The most used one sentence definition is "Do one thing and do it well." To me, that's an excellent summary of a major facet of the Unix Philosophy, or maybe a an excellent summary of a litmus test for the Unix Philosophy, but it's just a property or litmus test. To me, underlying the Unix Philosophy is a striving to give to a fairly smart person the option of quickly implementing an entirely new behavior for his computer. For instance, now that Google upranks "mobile-friendly" pages and downranks pages that don't pass their "mobile-friendly" test, I need to start converting all 800 pages of Troubleshooters.Com to mobile-friendliness. What can I do? Should I make a year's project of it by adding the necessary two lines in an editor? Or should I have a few hundred hours to write a beautifully polished GUI application enabling me to point and click my way through all this work?
Because I'm a fairly smart person and I'm aware of the Unix Philosophy, I made a third choice: I bolted together the tried and true tools that Unix (via Linux) gives me, in order to make programs to find unconverted web pages and convert them. My programs aren't pretty. They aren't polished. They aren't safe in the hands of the proverbial "dumb user". They don't need to be: They're being used by a fairly smart person prioritizing getting the job done over "pretty." The Unix Philosophy isn't for everyone, but in the hands of the right person (and there are millions of us "right people"), it's pure productivity power.
Before I list a few of the many tools Unix gives you, let me define the word filter in computing terms. A filter is a program that takes its input from its stdin, adds, changes and/or deletes lines from the stdin and sends them to stdout. You can cascade filters, connecting them output to input, with the pipe symbol (|). The following is an example:
find . | \ grep -v "\.png$" | \ grep -v "\.class$" | \ vim -
In the preceding, grep is used as a filter twice: Once to remove all lines ending with .png, and the second to remove all lines ending in .class. The result gets sent to vim for observation. Once again, filters are used to cascade, output to input, processes that take their input from stdin and put their output on stdout. Note that the very first process in the cascade needn't take data from stdin find has no stdin input, and the last process in the cascade needn't output to stdout (vim normally doesn't output to stdout).
Note:
The example just discussed could have been done with
find . | grep -E -v '\.png$|\.class$' | vim -, but the purpose of the example was to exhibit a series of several pipes.
The following are some of the tools you can bolt together to create your own custom processes:
The Unix Philosophy preserves the ability Ken Thompson and his successors gave you to mock up new computer functionalities with lightning speed when using Unix. The xargs program is a necessary part of the Unix Philosophy.
Problem #1 is solved mainly with the --interactive option and by, at first, having xargs run a simple echo command.
Problem #2 is usually solved with the --max-lines=1 command line option. Occasionally the problem at hand might call for you to use two lines for one process that takes two arguments, so that would require --max-lines=2. Obviously this could be extrapolated to higher numbers. Also, in some cases the situation might require you to count the number of words on a line rather than lines themselves. In those cases you'd use the --max-args=1 option, possibly changing the number to a higher one.
Problem 3 occurs occasionally, and is solved with the -r command line option.
For performance purposes, it's often advantageous for xargs to run several instances of the command specified as its argument. This is accomplished by the --max-procs=20 command line option, where you change 20 to whatever you believe to be the best maximum number of such processes. I'd highly advise against --max-procs=0, because this would create processes without limit, and no computer is limitless. This would be like jumpering a circuit breaker with a large diameter wire: it never trips, but the house can burn down, or in this case, the computer can crash. Remember, the --max-procs option is most advantageous when the process being run multiply is bottlenecked by something other than the CPU. If the process being run is CPU bound, there's no advantage running more instances of that process than you have cores in the computer, and if it uses all your cores, other processes are going to be very slow. -max-procs=20 (or higher) is best used when the process bottlenecked by a socket on the network that's waiting for responses.
xargs is an integral adapter used in the plumbing together of basic Unix commands as well as custom scripts you write, so it's an integral part of the Unix Philosophy and part of the way you write quick programs and prototypes.
[ Training | Troubleshooters.Com | Email Steve Litt ]