Troubleshooters.Com Presents

Linux Productivity Magazine

Volume 1 Issue 1, August 2002
Backup In a Linux Environment

Copyright (C) 2002 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Linux Productivity Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist and Rapid Learning: Secret Weapon of the Successful Technologist.

[ Troubleshooters.Com | Back Issues ]

The best way to predict the future is to create it. -- Attributed variously to
Peter Drucker, Alan Kay, Thomas Edison and others


Editor's Desk

By Steve Litt
Much has happened since the July 1998 Troubleshooting Professional Magazine covered data backup strategies. Linux climbed from toy status to the Microsoft's generally acknowledged #1 competitor. Linux desktops transitioned from a cruel joke to a very practical work tool. The price of a knee-of-the-curve desktop dropped several percent while average RAM rose from 32MB to 128MB, average processor speed rose from 300Mhz to 1700Mhz, and average disk space rose from 8GB to 60GB.  CD writer drives went from 2x to 32x. CDR media write speed rose from 4x to 32x, while prices dropped from a dollar to 33 cents. CDRW transitioned from a luxury to a reliable, cheap and convenient way to do short term backups.

Those four years saw changes here at Troubleshooters.Com. Backup media transitioned from 100MB Zip media to 650MB CDR and CDRW. PKZip was dropped in favor of the Open Source tar and gzip. We switched from Windows to Linux. And last but not least, Troubleshooting Professional Magazine split in two: a new, quarterly Troubleshooting Professional Magazine that's all troubleshooting all the time, and the monthly Linux Productivity Magazine that you're reading now. This is the premier issue of Linux Productivity Magazine.

All these changes notwithstanding, the backup principles espoused in the July 1998 remain conspicuously unchanged. Restorability, be it short term, mid term or long term, is still the top priority. Reliable media is still vital for all backups. Use of ubiquitous, standard media, format and compression are still the key to long term restorability.

The principles remain the same, but in a GNU/Linux and Open Source/free software environment, implementation of those principles have become so much nicer. This month's magazine details that implementation.

This month's material applies mainly to SOHO (Small Office/Home Office) computers and networks because that's what I'm familiar with. Big iron shops with hundreds of users have no choice but to back up to tape, and the tape machines they use cost a fortune. They typically have several system administrators, each of whom has backup as a significant part of his or her job responsibilities. Luckily, big iron shops have the budget to make this work. SOHO environments typically back up with sweat equity, and their small volume of data makes that possible.

The preceding paragraph not withstanding, many of the underlying concepts and principles apply to shops of any size. One thing remains constant in any highly automated business -- if you lose all your data and can't restore it, you're out of business.

So kick back, put your feet up, and contemplate this important aspect of data security. And remember, if you're a Troubleshooter, this is your
magazine. Enjoy!

Steve Litt is the author of "Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

What Happened to Troubleshooting Professional Magazine?

By Steve Litt
Troubleshooting Professional Magazine split in two. From this point on, Troubleshooting Professional is a quarterly magazine whose every article discusses Troubleshooting -- process, application, tools, tips and techniques. As mentioned, it comes out four times a year:
Winter   January
Spring April
Summer July
Fall October

All the Linux, Open Source and free software content has been removed from Troubleshooting Professional Magazine and placed in a new monthly magazine, Linux Productivity Magazine. Right now you're reading the August 2002 premier issue of Linux Productivity Magazine.
(January, April, July and October)This is the premier issue of Linux Productivity Magazine, dated August, 2002.

This split was executed in order to give our Troubleshooting audience exactly what they want, and our Open Source audience exactly what they want. The results of our informal election indicated that readers of both persuasions overwhelmingly preferred the split. Interestingly enough, this was in spite of the fact that a large portion of Troubleshooting Professional's readership were interested both in Troubleshooting and in Open Source content. Such "dual citizens" will now be happy to read both magazines.

Steve Litt is the author of "Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

What Makes a Good Backup?

By Steve Litt

NOTE: This article is an updated version of the same article in the 7/1998 Troubleshooting Professional Magazine.

What makes a good backup? It varies depending on the purpose and nature of the backup, but the following are a great set of criteria:

Predictable and Trustworthy

"It'll probably work" cuts it in many situations, but not with backups. When a disk crashes, it's essential the backup be trusted. It must do the same thing, make the same files, restore the same way, every time. In most cases this requirement rules out inexpensive tape solutions. It also rules out software that runs out of memory, recurses into infinity, hangs, or other flaky behavior.

The best way to test predictability and trustworthiness of a backup solution (software and hardware) is to repeatedly backup and restore (to a different drive, obviously) a complex backup. Are the results the same every time?

In addition, any good backup system provides a method to confirm backup accuracy. This comes in two flavors, comparision against original and CRC comparison. Each has its advantages, and using Linux, there's nothing to stop you from implementing both.

Comparison Against Original

Right after the backup, this is the most reliable test. It verifies that each backed up file is a byte for byte match of its source. However, once disk data changes (usually within an hour after backup), this method can't be used. Particularly, it can't be used months later to detect media deterioration induced change in the backup. If your backup procedures compare against the original, productivity must stop from the beginning of the backup until the compare completes.

Good systems provide a log of files that don't match. In such a case productivity can continue, and one can simply compare the list of non-comparing files with the list of files modified during backup and compare.

There are backup systems that compare each file immediately after it's backed up. For the most part that allows productivity to continue.

CRC Comparison

This compares the CRC (cyclic redundancy check, a number reliably verifying the contents of the file) calculated from each backed up file with the CRC originally read off its source file. Obviously, if the backup program reads the source wrong, this wouldn't be detected as it would be in the comparison against original method.

Other than the above, CRC comparision has a host of advantages.

To be predictable and trustworthy, any backup system must provide a method of confirmation on a file by file basis.

Accurate and Complete

You have a certain subset of your disk(s) you want backed up. It could be a hundred thousand files. Too many to inspect by eye. Is it backing up what you think, and excluding what you think? If not, data loss is just a crash away.

To be accurate, the backup system must provide the user with a method of choosing, using various criteria, which files and directories to be backed up.

Set up a backup criteria, then repeatedly back up and restore with your present backup setup and the one you're thinking of going to. Are the results the same? If so, you're OK. If not, figure out which one isn't doing its job. Repeat this test with various backup criteria at various times


Every day millions of write-only backups come into being. Write-only because the backup software doesn't work, or it doesn't work with the hardware or operating system, or because the user made a mistake.

Millions more backups were once good, but became unrestorable due to age, or new hardware/software environments. You needn't search hard to find stories of 1960's or 1970's source code and data lost because the knowledge of the data's format has been lost. In some cases the information is transcribed from paper copies, while in others it's just lost forever.

The entire purpose of backup it to be able to restore, so restorability is essential. There are three categories of restorability:

  1. Restorable soon after backup
  2. Restorable years after backup
  3. Restorable after disaster

Restorable soon after backup

Can it be restored? Can it be restored to a different drive so it can be explored before overwriting the existing? Is the media high reliability (for instance, moderate cost tape backup systems are not). Will little hardware conflicts render restore impossible? Will the restore software work with less than perfect backups? Will the tape backup restore with a different drive if necessary?

In general, magnetic oxide is less reliable than desirable. Oxide can easily become damaged, causing readability destroying "dropouts".  Certainly, floppies cannot be trusted with your data. Tape media range from junk to highly reliable. Spectacular manufacturing quality contributes to some of the reliability. Another undoubtedly vital factor in tape reliability is redundancy. The greater the redundancy, and the more intelligently it's implemented, the larger oxide dropouts can be tolerated by deducing what was there. From what I've heard, helical tapes (with spinning heads like a video tape) are more reliable than tape that passes over a static head.  Finally, some magnetic oxide media are intrinsically reliable. IOmega Zip disks are an example.

Optical media seem more reliable in general. The 650MB CDR format is certainly an example. If you use reasonable quality blanks, keep CDs covered, and keep them in any kind of reasonable environment, and they'll be byte for byte restorable for years.

Restorable years after backup

I don't have a crystal ball, but I'll bet you five years from now you'll be using a different operating system (or at least a different distro and a radically different Linux version) with different archiving hardware and different archiving software. I'll also bet you you'll still consider your 2002 tax return vital.

I think its safe to say that five years from now proprietary hardware/software combinations WILL NOT RESTORE. You might be able to read the tape, but it's unlikely you'll find software to read that proprietary tape on your newer (and from a different manufacturer) drive. No matter what hardware is involved, you're much better to record the backup in a standard format, like .zip or .tgz or whatever. That way, once you get that single file off the tape or other media, you can work with it on your hard disk in a hardware free environment.

You don't want a proprietary binary file format. My early Zip drive backups are unreadable on my Linux system due to that silly .qic format. To read them I need Microsoft Backup.

Likewise, standard hardware media and format is essential. Over the years there have been floppies that spun ten times as fast, recording ten times the data. There have been backup programs that put 9 sectors per ring instead of 8, just to gain that %12. There have been all sorts of optical disk media formats. How can you read those media now? You don't.

Here's your best bet. Have the hardware and software for the backup completely separate and independent. Both hardware and software should be a ubiquitous industry standard, and both should have achieved that status for several years. Iomega Zip disks and ISO9660 650 MB CDR's are great examples. Even the QIC-80 tapes I was using in 1994 are readable by drives you can buy on Ebay and standard Linux drivers. Stick with media that was a ubiquitous standard for a prolonged period of time.

Likewise, the software format should be a ubiquitous standard. The backup program that shipped with your tape drive likely writes the backup in a format readable only by that same software. Five years from now your OS won't support that software -- what do you do then? The solution is to have the format in a ubiquitous standard. The ZIP format (brought to us by PKWare) is an outstanding example. The tar and gzip formats are age-old UNIX favorites, and will be supported for a long time to come.

Make the backup on the media a simple file created by the backup software. Have the media look like a disk drive which can have files copied to and from it. Keep a copy of the backup software, including the part that restores. Then, even years later, all you need to do is find hardware and drivers to read the drive, copy the file off it, and use the backup software to restore.

Last, but not least, restoring from years ago requires existence of such backups. Backup media are expensive, so there's a natural inclination to re-use them. Strike a balance between re-use and data conservation. It might be something like this:

Find a friend who's a big-iron computer guy -- he'll know how to implement such a system.

Ongoing Black Hole Prevention

It's ironic that the most obsolescence resistant format is the paper copy. Drives to read my QIC-80 tapes from the early 1990's are hard to come by. As far as I know, they're not made any more and I must buy one used if I want to save my early 1990's backups. Given the hassle of doing that, and my suspicion that any files that dropped through the cracks before 1996 were unimportant at best, I choose to let those backups vanish into a black hole.

Not so with my 1996-1998 Zip drive backups. Soon (before my 5 year old Zip drive wears out) I'll transfer those backups to CDR format, so when my Zip drive gives up the ghost my Zip drive backups don't go with it.

As old media and backup formats obsolete, you should transfer them to the latest media and formats while you can still decode them. Fortunately this isn't difficult. Because media sizes double every 2 or 3 years and get cheaper to boot, you can transfer those old backups to more modern media/formats easily and cheaply. And because hard disks keep growing every year, you'll have enough scratch space to assemble new images for those old backups.

When transferring old backups, one or two per year is sufficient. And if your business has a policy of dumping aged data (many people do this to limit discovery in litigation), data older than the maximum saved data age needn't be transferred -- it should be destroyed.

Restorable after disaster

Flying televisions, microwaves, and computers woke us 4:31am the morning of January 17, 1994. The famed Northridge Earthquake (which was really centered in Reseda, 6 blocks from our home) rendered our place uninhabitable. We gathered a checkbook, a change of clothes, diapers, formula, and medicine for our exodus from Southern California. On the way out the door I grabbed the tape backup, and prayed it would restore.

The backup was restorable, and I didn't need it anyway. My Gateway 486-66 took a licking and kept on ticking (thanks Gateway).

But what if it had been more like the 1906 San Francisco earthquake, with a firestorm driving us out with only the clothes on our back? Or a flash flood? Or a home-invasion robbery? Both the backups and the computer might have been gone. That's why offsite backup is so vital.

Not just any offsite backup. Backup out of the region -- out of reach of a regional disaster like a large earthquake, a hurricane or a flood. It's expensive to buy and ship media out of state, so you might want to do it only once or twice a year. Balance the expense against how far you're willing to be "set back" in event of a regional disaster-caused data loss. Here's a possibility:

Once upon a time I recommended keeping a backup in the trunk of your car. I no longer recommend that because the risk of somebody stealing that backup and using your private data exceeds the benefit of an accessible backup. The exception might be if you can't find a trustworthy local friend with whom to store your backup.

Easy to Use

A backup technique that's too hard to use won't be used, or won't be used often enough. An excessively difficult, time-consuming, or costly backup technique is useless.

On the other hand, entirely too many write-only backups have been done in the name of "ease". "One button" backups often don't include the right stuff, and many times don't restore, or restore to the wrong drive, or whatever.

The ideal backup is a fully configurable one whose configuration can be remembered between backups.


A proper system of daily, weekly, monthly, quarterly and yearly backups, with proper use of offsite backup, is essential to guarantee the safety of your data. Standardized media, media organization and backup format are essential for long term backups.
Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist . He can be reached at Steve Litt's email address .

Single File vs. File by File backups

By Steve Litt
With floppy, CDR and Zip disk backups, you have the choice of organizing the backup into a single backup file, or as a mirror image of the original directory tree, with each file compressed. Each has advantages and disadvantages.
Single file File by file
  • Easier whole-archive verification
  • More portable (8.3 filename)
  • Slightly better compression
  • Only damaged files will be unreadable
  • Trivial to restore single files or subtrees
  • Can use files right on the backup if disk space is scarce
  • A single bad bit can make the whole archive unreadable
  • More difficult restore single files or subtrees
  • Whole-archive verification is difficult
  • Filenaming quirks are in the media format rather than the compression/backup format
  • Slightly worse compression

The simplest possible backup of a directory tree called /d would be as follows:

cp -p -R /d /mnt/zip
The preceding is a file by file archive with no compression and no CRC. Verification is done as follows:
diff -r /d /mnt/zip
With a Zip drive, it may also be possible to write down a separate checksum like this:
md5sum /dev/sda4
Obviously that checksum can't be added to the disk, because doing so would change the checksum of the disk...

Device wide checksums aren't reliable on CDs.  I've never found a reliable method to read the entirety of a CD device -- no less and no more. The reason is that it errors out toward the end, whether or not all the data (and no more) is read. In other words,

md5sum /dev/cdrom
produces different results on different boxes with different CD drives. The following produces slightly more consistent results across boxes, but still has unacceptible variation:
cat /dev/cdrom | md5sum
Still more consistent, but nevertheless unacceptibly variable is the following:
mount /dev/cdrom
blocksize=`/sbin/blockdev --getss /dev/cdrom`
blockcount=`/sbin/blockdev --getsize /dev/cdrom`
dd if=/dev/cdrom bs=$blocksize count=$blockcount conv=notrunc,noerror | md5sum
Perhaps the most consistent checksum comes when the preceding blocksize and blockcount are replaced with numbers obtained from the .iso that produced the CD, and plugged into the command, it's accurate on most boxes. But not all.

The bottom line is that device based checksums of file by file archives are problematic. A much better method would be to make a series of per-file checksums, and then checksum that series. Or better yet, have per-file and per-directory checksums, making it possible to verify individual files and directories at restore time.

The simple filecopy archive can be compressed by compressing (gzip) each individual file. The easiest way to do this is to copy the directory to a scratch location and then gzip -r new_directory. , after which checksum files can be made and then the tree can be used to create a .iso from which a CD is burned, tree-copied to a Zip disk, or put in a format suitable for taping.

If you're interested in file by file backup but want something more sophisticated than what's described in this article, take a look at Bryan Smith's back2cd utility (URL in URL's section).

Single File Backup

I use a single file backup. One reason is habit -- I've used single file backups since 1986 (Fastback). But the main motivation is to shelter the ISO9660 format from long filenames without introducing extensions such as Joliet or Rock Ridge, which may not be supported as long as ISO9660 itself. By placing all files and trees in a single file, that file's format can handle long filenames (or filenames with oddball characters), while the CD itself sees an 8.3 filename such as 020801.tgz.

I have no fear of the major single file disadvantage -- a single byte corruption making the entire  backup unreadable. Tests on my oldest CDR backups (December 1998, and early 1999 ISO9660 CDs containing multiple .zip file) show that out of the 13 .zip files checked, all tested perfect -- not a single bit of corruption. Some 1996 .zip file backups recorded on Zip Disks also were 100% perfect. So according to my tests, the risk of losing data is tiny. Combine this with the fact that CDR media is so cheap and so space efficient (if U use paper sleeves), that I have monthly backups going back years. So if one or two fail, there are plenty more.

The other benefit is easy whole-backup checksum verification. The CD contains the archive file, and in addition it contains a file containing the checksum created by md5sum. So 10 years from now I can verify that the entire backup is valid, even if I've since copied it to different disks or media.

NOTE: My understanding is that most or all types of corruption of a gzipped tar file would prevent unarchiving. My reason for checksumming is to make absolutely sure, beyond doubt, that I never restore a corrupted file.

Here at Troubleshooters.Com, a script to back up two trees to two separate compressed archives, enclose those archives in a .iso image file, and burn a CD from that .iso image, is a simple 45 line shell script. The script to burn that .iso is a single command. The script to verify both archives (regardless of the dates in their names) by both the data comparison and checksum methods, and issue both specific errors and general go-nogo messages for each archive, is a 72 line shell script.


If your backup media is unreliable, use file by file backups if possible. If I were backing up to tape, I'd compress the individual file and then tar the tree, rather than the other way around. If you frequently need to restore individual files or subtrees, file by file backups make that process much easier and faster. If your backup media is highly reliable (CDR, for instance), and your backups are a seldom used last resort, then single file backups offer ease of backup and the peace of mind of a media format unencumbered by extensions required for long filenames and the like.
Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Data Backups vs. System Backups

By Steve Litt
From a planning standpoint, the easiest backup is to back up the entire filesystem. Add some sort of bootable mini-system, and recovering from a disk crash is as simple as boot and restore. But system backups aren't as ideal as they sound. The problem is the shear volume of files in a full system backup. Consider how my 15 years of data files stack up next to the operating system:
[root@mydesk root]# du -s -k /d
880704  /d
[root@mydesk root]# du -s -k /classic/a
622816  /classic/a
[root@mydesk root]# du -s -k /inst
182948  /inst
[root@mydesk root]# du -s -k /    
11123046        /
[root@mydesk root]# du -s -k /usr
2550204 /usr
[root@mydesk root]# du -s -k /var
137980  /var
[root@mydesk root]#
The preceding shows my data (/d and /classic/a) consuming about 1.7GB of an 11.1GB file system. The main OS trees, /usr and /var take up about 2.9GB. I'd suspect the rest is contained in junk I've collected, primarily in /home. This is one reason that on my personal computer I prefer not to use /home as a data directory.

It's not that difficult to back up 1.7GB on CDR, especially when less than 650MB consists of fast moving data. Backing up 11.1GB onto CDR is clumsy and impractical, so system backups are almost always done to tape. And most inexpensive tape solutions are not nearly as error free as CDR. Which means if you want a complete system backup, you're more or less forced to use either a very expensive backup method or an iffy one.

What is Data

We all know what data is, but sometimes definitions resolve close calls. I define data as something I cannot go out and buy for a resonable cost. For instance, a copy of Microsoft Office is not data, because even if their licensing prohibited reinstallation, I could go buy a new copy for a few hundred dollars. Contrast that to the LyX file of my book, "Troubleshooting Techniques of the Successful Technologist". It isn't available at any price. Sure, I could buy the book, but not the original LyX file. To recreate the LyX file I'd need to transcribe it from paper, or if I were lucky enough to find a .ps file of the book, I could meticulously convert it to LyX by recreating styles in the book. Due to the reduced typing and copy editing, converting from a postscript file might be a 2 week task instead of a 6 week task.

There are some gray areas. I consider my copy of Micrografx Windows Draw data. Why? Because Micrografx long ago sold it to an entity who no longer sells it. It's unpurchaseble, and much of my true data (drawings) is readable only in Micrografx Windows Draw. So even though it's just a computer program, I have it backed up every which way from Sunday.

Another gray area is configuration. Certainly I consider all my scripts data. Nobody sells them. Likewise my GPL projects are data for me because I'm the source. For anyone else it would be data. Oppositely, my copy of LyX is not data, because I can download it any place.

Interestingly, the Netscape that came with my Mandrake 8.1 is data. Why? Because Troubleshooters.Com is written and maintained with Netscape Composer. Mandrake stopped packaging Netscape with their distro, and the Mozilla Composer they do package has problems making it useless for maintaining Troubleshooters.Com. As far as I know, Netscape 6x is also unsuitable for maintaining Troubleshooters.Com. So Netscape has become something I cannot purchase.


The whole purpose of backups is to prevent setbacks. In the case of a system backup, if you have a system backup and a suitable, bootable method of restoring it, a system crash will set you back maybe 3 hours. If you have a backup of your data, but not your system disk, and your hard disk completely wipes out, you'll be back up in 2 to 8 hours, but you'll continue solving various problems (DNS, DHCP, Samba, various symlinks, application configurations, fonts) for several days. But if you lose all your data, you're out of business. By the time you recreate it your competitors will be so far ahead of you that you might as well just go get a job at the lumber store. Even a single key directory could put you out of business.

Backups are insurance. Insurance is meant to protect unbearable losses. If it helps pay for small misfortunes so much the better, but it's essence is to protect against unbearable losses. So if you have a choice between all-inclusive system backup that's not all that reliable, and a data-only backup that's utterly reliable (or better yet several such backups), choose the data backup.

Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Backup Strategy

By Steve Litt

Data Directory Strategy

A data-only backup strategy requires that certain directory trees include data, and nothing but data. A data backup then becomes as simple as backing up certain trees. In the simplest possible situation, all data is in a single tree and that tree will fit on one piece of your backup media. But eventually it will outgrow a single media. You then have three choices:
  1. Get bigger media
  2. Span media
  3. Back up specific trees to specific media
#1's a great idea if there's a long-time ubiquitously supported medium that's priced right and fits in with your organization's needs. Unfortunately that'such perfect fits are rare. Resist the temptation to go with proprietary media designs, as five years from now they'll be difficult to read and support.
#2 has been around forever. With certain program's it's certainly easy -- the program prompts you to change media. But with other programs it's tricky, and a problem on any CD invalidates the whole set.
#3 doesn't make efficient use of all the media, and it requires tayloring your computer's directories to work with such backup. But I use it because once your directories are so taylored, it's easy, and CDR media is cheap anyway, and besides, you can make it so that some of the backups are fast changing data (back up often), while others are slow moving (back up infrequently).

One great way to divide your data is to have  a partition for fast changing data mounted at /d, a directory for installation programs that have achieved data status (like Netscape as discussed in the Data Backups vs. System Backups articles earlier in this magazine) mounted at /inst, and static data (work from years ago that's no longer maintained) at /classic/a, /classic/b, classic/c, etc. Each partition is a size such that when compressed it fits on one piece of media. Thus, if you're using 650MB CDR's for backup, /d might be 800MB. When it threatens to overflow, you move some of it (the stuff that you anticipate never changing again) to /classic/a. If that overflows you make a /classic/b. Thus you need back up the directories under /classic maybe twice a year, or whenever you move new data to them -- whichever comes first. Meanwhile, /inst is backed up weekly, with daily backups of the day's changed data.

Backup Timing

The rule of thumb is you back up frequently enough that you can take the punch if there's a crash. This typically means daily or hourly, but you certainly don't do a formal backup hourly. It would be more like this:
Every half hour Copy your current project to a backup directory on a scratch partition. No compression. Consecutively name such copies so you have a poor man's revision history.
Every day Incrimental backup for data changed that day. Send to a rewritable CD or to another machine (via NFS) -- anything that won't be destroyed by a disk crash on the backed up machine. If you use CDRW's you need 7 -- one for each day. 
Every week Full backup of all fast changing data. Back up to CDRW. You need four CDRW's -- one for each week in the month.
Every month Permanent full backup of all fast changing data. Back up to a CDR, keep for many years. They're cheap.
When needed Permanent full backup of slower changing data.

The preceding is just one possible example (it resembles what I do). The point to remember is that short term backups are quick and easy, while long term backups are built to survive many years. Many people do the revision backups (what I call half hourly) with a true revision tool like CVS or RCS.  Some forego the daily backups and rely on the half hourly backups instead. CDR's are so cheap that some people back up the slow moving data every month just to have complete backup sets.

Half hourly backups should take maybe 10 seconds because they protect only 1/2 hour of work. Daily backups should take maybe 5 minutes. Weekly backups should take maybe 1/2 hour. Monthly and beyond take whatever time is necessary -- they're what keep you from being bombed back to the stone age.

Speaking of time, you can maximize the concurrency of backing up and working with an early archive/data comparison. For instance, if you're burning CDR's, you can tar -dvf the .tgz files immediately after making them and recording their md5sum numbers in a file. Once the tar -dvf are done, you can change data to your heart's content while creating .iso files and then burning the CD's. CD confirmation is done via md5sum checking, knowing that the original md5sum files were created with files that byte for byte data compared with the original data (via tar dvf).

Media Format

For reasons discussed previously, the media format must be reliable and ubiquitously common and supported, reliable. It must also be fast enough and be sized and priced appropriately for your situation. ISO9660 CDR suits a wide variety of needs. It's big enough to hold the data for most small offices -- often a single CDR will hold all the data. Quality 32X CDR media is now priced at 33 to 50 cents per blank.

In my opinion, the existence of cheap CDR's make Zip drives redundant for backup. Based on sources researched on the Internet, you can buy 20 blank CDR's for the price of one Zip disk. And you can buy 5 or so CDRW's for the price of one Zip disk, if rewriteability is an issue. The coolest thing about CDR is because they're not rewriteable, you're never tempted to cannibalize your media.

You might consider recording DVD's. They hold up to 7 times as much data. Indeed, if your data backups are several GB this is the only practical way. But DVD writers are in the neighborhood of $500.00, as opposed to $79.00 CDR drives. And DVD media are in the neighborhood of $6.00 -- over twice the price per GB of CDR media. Consider also that DVD isn't yet as standard or ubiquitous as ISO9660. It's likely the day will come that DVD media is the media of choice, but for most people and small businesses today, CDR is still the better way to go.

Tape is great because it holds huge amounts of data, and the cost per GB is very low, and it's re-recordable. But my past experience tells me that many tape formats fall far short of the reliability exhibited by CDR and Zip Disk media. I have no data on how they degrade over time, but given their thinness, flexibility, and reliance on oxides, I'd be cautious about depending on tapes for long term backup. Some tapes are highly reliable, but from what I've heard those are the very expensive ones. One can also use a compression format with huge amounts of redundancy, so data errors can be corrected. Personally, I would use tape for system backups, but unless I had no other choice I'd stay clear of tape for backing up valuable data.

Steve Litt is the author of " Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Simple and Useful Backup Scripts

By Steve Litt
My experience is that CDR drives often fail to accurately read CD's -- even the CD's that are made in the CDR. So I record on a CDR drive, then verify on a CD drive. This implies using more than one script (one for backing up, and one for verifying). In fact, I use three -- one for creating the ISO image, one for burning, and one for CD verification. The separate burning script makes it easier to work with different write speeds, as well as CDR/CDRW.

As mentioned previously in this magazine, the two types of verification are data comparison and CRC. A successful data comparison is proof positive that the .tgz file accurately mirrors the disk data it contains, at the moment of the data comparison. A successful CRC comparison is proof positive that the .tgz file is identical to the file originally created, and is excellent for proving accuracy of older backups. I use both.

The data comparison is problematic because you cannot alter any data between the beginning of the tar command and the conclusion of the data comparison. This disadvantage is minimized by data-comparing the .tgz immediately after a CRC is created for it, but before it is rolled into a .iso and before it's burned. Once the  original .tgz is proved good with data comparison, work can proceed. Once the CD is burned, a CRC comparison will prove that the .tgz on the CD is identical to the .tgz first created.

To minimize the time that work must halt, the fast-moving data should be done first, and a clear indication should be given that it's safe to proceed. So the procedure is like this:

The scripts in this article archive the fast moving /d and slower moving /inst on one CD, and the very slow moving /classic/a on a second CD. There are five scripts:
  1. ISO Creation Script
  2. CD Burning Script, CDR
  3. CD Burning Script CDRW
  4. Checksum Verification Script, /d and /inst
  5. Checksum Verification Script, /classic/a
1. ISO Creation Script
# isomountpoint=/mnt/iso



rm -f $tgz/*.tgz
rm -f $tgz/*.iso
rm -f $tgz/*.md5

cd /d
tar czvf $dtgz /d
md5sum $dtgz > $dmd5

#DIFF /d
cd /
echo -n "Diffing $dtgz, please wait... "
dtgzdiff=`tar dzf $dtgz`
echo "$dtgzdiff diffed."
if (! test "$dtgzdiff" = ""); then
   echo -n "Press enter to continue==>"
   read tempp

sleep 10

#BACK UP /inst
cd /inst
tar czvf $itgz /inst
md5sum   $itgz > $imd5

#DIFF /inst
cd /
echo -n "Diffing $itgz, please wait... "
itgzdiff=`tar dzf $itgz`
echo "$itgzdiff diffed."
if (! test "$itgzdiff" = ""); then
   echo -n "Press enter to continue==>"
   read tempp
echo "d  status=$dstatus"
echo "i  status=$istatus"

cd $tgz
mkisofs -pad -o $mainiso $dtgz $itgz $dmd5 $imd5

#BACK UP /classic/a
cd /classic/a
tar czvf "$catgz" /classic/a
md5sum   $catgz > $camd5

cd /
echo -n "Diffing $catgz, please wait... "
catgzdiff=`tar dzf $catgz`
echo "$catgzdiff diffed."
if (! test "$catgzdiff" = ""); then
   echo -n "Press enter to continue==>"
   read tempp

cd $tgz
mkisofs -pad -o $classiso   $catgz $camd5

echo "d  status=$dstatus"
echo "i  status=$istatus"
echo "ca status=$istatus"

echo If all is well, burn the CDs

2. CD Burning Script, CDR
cdrecord dev=0,0,0 blank=fast speed=12 -v -eject /tmp/tgz/mainbup.iso
3. CD Burning Script CDRW
cdrecord dev=0,0,0 speed=16 -v -eject /tmp/tgz/mainbup.iso
4. Checksum Verification Script, /d and /inst
mount $DEVICE

dtgz=`ls $MOUNTPOINT/d*.tgz`
dmd5=`ls $MOUNTPOINT/d*.md5`
itgz=`ls $MOUNTPOINT/i*.tgz`
imd5=`ls $MOUNTPOINT/i*.md5`

echo -n "Comparing Checksums for $dtgz, please wait... "
dmd5val=`cut -d " " -f 1 $dmd5`
dtgzval=`md5sum $dtgz | cut -d " " -f 1`
echo " finished."

echo "$dmd5 value==>$dmd5val<=="
echo "$dtgz value==>$dtgzval<=="
if (! test "$dmd5val" = "$dtgzval"); then
   echo ERROR: MD5SUM MISMATCH: $dtgz!


echo -n "Comparing Checksums for $itgz, please wait... "
imd5val=`cut -d " " -f 1 $imd5`
itgzval=`md5sum $itgz | cut -d " " -f 1`
echo " finished."

echo "$imd5 value==>$imd5val<=="
echo "$itgz value==>$itgzval<=="
if (! test "$imd5val" = "$itgzval"); then
   echo ERROR: MD5SUM MISMATCH: $itgz!

echo "d status=$dstatus"
echo "i status=$istatus"
5. Checksum Verification Script, /classic/a
mount $DEVICE

ctgz=`ls $MOUNTPOINT/ca*.tgz`
cmd5=`ls $MOUNTPOINT/ca*.md5`

echo -n "Comparing Checksums for $ctgz, please wait... "
cmd5val=`cut -d " " -f 1 $cmd5`
ctgzval=`md5sum $ctgz | cut -d " " -f 1`
echo " finished."

echo "$cmd5 value==>$cmd5val<=="
echo "$ctgz value==>$ctgzval<=="
if (! test "$cmd5val" = "$ctgzval"); then
   echo ERROR: MD5SUM MISMATCH: $ctgz!


echo "ca status=$cstatus"
Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist . He can be reached at Steve Litt's email address .

CBDTPA and Backup

By Steve Litt
The proposed CBDTPA law mandates copy protection be built into all electronics and software. In other words, tape drives, hard disks, CDR and CDRW drives, as well as the software that drives them. The instant the law is passed, all further manufacturing must include the copy protection.

If the law passes, depending on how it's implemented both legally and technologically, and if it's judged constitutional, it's likely that backups as we know them will be impossible. You might not be able to copy data from the hard disk to the CD unless such data includes special copy protection codes. It's very unlikely that you would be able to restore data from a backup CD to a new hard disk in the event of a hard disk. It's possible that even the UNIX/Linux/BSD cp and dd commands will be illegal.

Of course, vendors will be more than happy to provide copy protection enabled backup media and software. But without competitive forces from Open Source software, what's their incentive to provide reliable products at a good price? The days of using free software to back up to a 35 cent CD on a $79.00 drive will be gone.

Write your senator and congress people, and let them know you won't take kindly to legislators voting in favor of CBDTPA.

Steve Litt is the author of "Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address .

Life After Windows: Switching Back to Windows

Life After Windows is a regular Linux Productivity Magazine column, by Steve Litt, bringing you observations and tips subsequent to Troubleshooters.Com's Windows to Linux conversion.
By Steve Litt
In early or mid July, 2002, Tony Collins published an article documenting his transition from Linux to Windows, after 3.5 years Linux use, citing Linux's complexity and fine grained control, problems with the various distros he migrated to, kernel compiles to accommodate new hardware, the command line, X, terrible fonts and difficulty of installing fonts, problems obtaining and installing hardware drivers, lack of an installer that's standard across distros, and the arrogance of some within the Linux community. He noted that he'd still use Linux for servers, and that he would avoid Microsoft applications. IMHO that's a very good stance from a "who owns your data" perspective.

My first supposition is that this was simply one more case of Microsoft paying writers to pretend to be grassroots Windows fans (two such instances are documented in the URL's section of this magazine). However, when Tony responded to my email, it was clear he was for real. However, a careful reading of his response made me suspect that the problems mentioned in the preceding paragraph were not the root cause of his switching, and that the bottom line reason was simply time.

Tony tweaked. He tweaked a lot. He tried different distros. He bought hardware not especially supported by the current Linux. He joined a LUG and even became a board member. He started an IRC channel for a games group, and created SuSe packages for several free software projects. And somewhere in all of this, time became scarce. If I read between the lines, he gave up Linux for the same reason I don't load fun games -- I'd spend too much time using them.

Most Linux people have the exact same difficulty. Linux is just so darn tweakable. I spend an hour a day or more revamping scripts, enhancing my menu system, and doing other stuff to make my work go faster. But is it worth it, or am I just hacking for fun and letting family and business stagnate?

I look at it like this: In 1990 my DOS computer was tweaked to the max, with a menu system and scripts to do everything I did on a regular basis. It was a lean, mean production machine. Then came Windows, and I spent the next 10 years pointing and clicking and spending lots of time on the most mundane, redundant tasks. I got very productive at such pointing and clicking, to the point I almost forgot how nice it was to run a script rather than point and click ten times through a procedure. By the time I switched to Linux, my work was very inefficient. And I've spent the last year catching up.

But the bottom line is this: If Linux makes you spend too much time, perhaps Windows is an alternative. Without the possibility of major tweaking, maybe you'll get down to business, however cumbersome that business may be. If so, here are my recommendations for the migration:

Linux to Windows Migration Strategy

Like every other activity, migrating from Linux to Windows requires a process. Here's a pretty good process to use. Now let's discuss this process.

Understand Your Motivation

Why are you doing this? Is there some functionality Linux isn't giving you? If so, might it be better to have a separate Windows machine to run that one functionality, while using your Linux box to manipulate most of yor data. Is the problem the increased difficulties of getting some hardware to work on Windows? If so, might the better answer be to look at the Linux hardware compatibility lists, and simply pick a piece of hardware that fits Linux like a glove? Maybe the hardware costs more, but if money's the issue, you're probably saving much more money on software than you're spending on Linux-compatible hardware.

Is the problem Linux's ugly fonts? There's no doubt that a standard Windows machine has nicer fonts than a standard Linux machine. However, consider that the appearance of Linux fonts can easily be improved to the point where they're quite readable and attractive on the screen, and downright beautiful on paper. Perhaps you long for the hundreds of fonts available in Windows. If so, keep in mind that all authorities on style agree that a document should use very few different typefaces, or else it will look like a kindergartner's scrapbook.

Is your motivation the need to spend too much time tweaking Linux? If so, ask yourself whether your problem might be solved by picking a distro -- any distro, and sticking with it, learning the GUI environment the same way you learned Windows in the old days.

Whatever your motivation, examine it carefully before taking a step that could take away your freedom to access your data and take away the fact that your computer is your castle.

Understand What You're Giving Up

It's natural to reminisce about nice fonts, fully developed apps, and plug and play hardware, but as you evaluate going back to Windows understand what you're giving up. You know that nice GNU C/C++ compiler you take for granted? Gone. Sure, you could conceivably load a GNU compiler for Windows, but it's much more likely you'll end up with an inferior product like Microsoft C++ or Visual Basic.

You've become accustomed to a high level of security. Of using a normal login and not needing root unless you're really adjusting the system. Of seeing email viruses bounce harmlessly off your system. Once you switch to Windows, those days are gone.

Today you think nothing of grabbing an installer disk and installing your operating system. Do you really want to give that up? If you switch to Windows XP you have forced registration. And if your hardware changes, you must beg Bill Gates to let you reinstall your bought and paid for software on the new hardware.

Have you ever stopped to think how much software you get on that $10.00 distribution? Without going to the store, without downloading on the net, without any effort at all, you get many programming languages (C, C++, Java, Perl, Python, Ruby). You get several word processing programs (KWord, Abiword, LyX and very possibly OpenOffice). Spreadsheets include the versatile, powerful and robust Gnumeric, Kspread, and often the OpenOffice spreadsheet program. You get the hugely powerful Gimp drawing program -- a program compared by many to Photoshop. You get presentation programs, and many other different types of office utilities. Your choice of the Mozilla, Galeon or Konqueror web browsers. Many different email clients, including the user friendly and virus impervious Kmail. Countless text editors, including Vim, an editor as powerful as those selling for $300.00 in the Windows world. And Emacs, which surpasses the power of all those editors.

Speaking of money, you've probably become accustomed to spending nothing for software. Prepare for that to end. My tax records indicate my 2001 software purchase expense to be 0. Nada. Zip. I think I'll write a check to the Mandrake folks just to let them know how much I appreciate everything they've given me. I think back to the mid 90's, when my software expenses were in the four figure category most years. Going back to Windows will cost serious money.

Say good bye to data portability. You've probably come to take for granted the text format of LyX, or the XML format of OpenOffice, dia, and many other programs. No matter what kind of data conversion you want to do, if worse comes to worst you can always write scripts to parse and build these files. Not pleasant, and if you're not a programmer you'll need to hire someone to do it, but it's doable. When you move to proprietary apps, the data is in unfathomable binary format, often legally "protected" by anti-reverse-engineering license language or even software patents.

And that brings up the most basic privelege you'll be giving up. You'll be giving up ownership of your data. To the extent that you use proprietary software, you rent your own data in the form of software license fees and software maintenance fees. And this landlord has a nastier legal language in his lease agreement than the worst slumlord, and you needn't even sign the lease before buying, nor can you even look at the lease before buying. Before making the switch, ask yourself "Who owns my data?".

Plan to Prevent Vendor Lock-in

Windows the operating system isn't all that bad. Nice fonts and a high level of hardware compatibility (thanks to what Judge Jackson and the Appeals Court term an illegal monopoly). And I hear WinXP doesn't crash much (of course we've heard that about every new version since W95). Yes, Windows itself is tolerable. The problem is the baggage it brings.

The minute you put your data in Microsoft Word, Bill Gates owns you. He can make it difficult or impossible to migrate your Word document to another format (LyX, let's say). If you upgrade to a new version, it's that much tougher. I've heard stories of people whose Word docs were unreadable by MS Word on a Mac, or even on a PC if it's an older version of Word. Bill Gates leads you around on a leash.

This problem repeats itself with every piece of software you use. Proprietary software is usually unfathomable binary for the simple reason that they want to make it as tough as possible to migrate away from their product. Once it's hard enough, they can ram any price down your throat, and you're helpless.

Application lock in is nothing compared to what Bill Gates plans for you on the Internet. With Passport, .net, applications, Internet Explorer and Windows all thoroughly entangled, you'll soon be sucked into the ultimate tar baby.

The solution, of course, is to use Open Source software on Windows. When you write books, use LyX, not MS Word. For an office suite, use Open Office, not MS Office. For finances, use GnuCash, not Quicken. Use Mozilla or Netscape, not Internet Explorer, to browse the net. Use a browser for email instead of the Outlook virus machine. Use ActiveState's implementation of free software Python or Perl instead of VB, and if you need a C/C++ compiler, take the trouble to install the GNU compiler and various .dll and header files to make it write native Windows code.

Understand that if, in the future, you want to migrate away from Windows, vendor lock-in will make it extremely difficult to do so. Unless you make plans to prevent Vendor lock-in, you are writing yourself a one way ticket to Windows land.

Make a List of All Applications You'll Need on the Windows Side

To resist the temptation of using Word or Wordpad "just this time", thus beginning your vendor lock in, you'll need your Open Source apps already on the Windows machine. Here are some you might need:

Test Your New Apps on the Windows Side

Just like you'd need to evaluate Linux and its apps before migrating to Linux, you need to evaluate the apps you'll be using on the Windows side.

Place All Your Data in One or a Very Few Trees

Presumably you've already done this, because it's good data processing practice to separate data from programs. But if you haven't yet done this, do it now. Because your Windows machine has no use for your Linux programs, and you don't want any of your data to get lost and not make the transition.

Make an Iron Clad Backup

Naturally you hope your return to the Windows world will be a happy one, but leave yourself the option of easily going back if things don't work out. Make an iron clad backup of your data before the transition. By iron clad I mean two verified backups on robust media like CDR.

You see, the minute you switch to Windows, Windows will begin playing silly tricks with the case of your filenames (Myfile.JPG, etc), and the permissions and attributes of your files. Back converting will be a huge hassle. Also, you want to back up before changing text data to the Windows form of line termination.

Make Your Data Windows Compatible

Next, change all text files to DOS format, with line termination crlf instead of just lf. The April 2001 Troubleshooting Professional Magazine has scripts to go the other way (Windows to Linux), so simply change the scripts accordingly. It's important to do ALL this work on the Linux side, because all those handy little techniques piping find into grep into conversion scripts just don't work as well under Windows. Be sure this work does not change the file's modification date. Using the touch command in the script can do that for you. Once again, see the April 2001 Troubleshooting Professional Magazine.

Move the Data

Move your data tree(s) via Samba or by making a .zip archive (containing file attributes and mod times) and then ftp'ing to the Windows computer.
If you use Zip, on the Windows side you'll probably need to unzip it using a proprietary shareware program such as PKZip or WinZip on the Windows end. In my opinion, data should go on its own drive letter so the data can be on its own partition, or better yet its own drive. I recommend this in Windows for exactly the same reasons I recommend it in Linux.

Get All Your Apps Running

Whatever apps you're using, get them running. Make sure they can access your new data. Make sure you can surf the web and send and receive email.

If You Want to Go Back to Linux...

It's always possible that after you migrate to Windows, you might not like it. In such a case, you need a quick and easy escape route. Certainly the first step is to have an iron clad backup of the data from your Linux box before you converted linefeed line termination to crlf. That backup is something you can lay right back down on a Linux box.

Next, on the Windows box use PKZip or WinZip to archive only those data files that have changed since your transition. This gives you a second tree -- an incremental tree, you could say. FTP that .zip file over to the new Linux box, restore it in an alternative directory, and then devise a script to move restore proper UNIX filename case, proper UNIX line termination, and then move the files to their original home.


Freedom is wonderful, but sometimes the ability to pursue our own choices can consume large amounts of time. It's conceivable that in order to save time, a person could give up his software freedom in order to use a Windows computer that restricts choices. Such a person should first really understand what he or she is giving up, and if it still sounds like a good solution, he or she should switch to Windows.

From there, the person needs to follow a procedure to make sure Windows can serve him well, and make sure there's no vendor lockin. Finally, don't burn your bridges.  Get an iron clad backup before beginning your transition, and have a plan to go back if things don't work out as well in Windows as you thought they would.

Steve Litt is the author of the course on the Universal Troubleshooting Process.  He can be reached at Steve Litt's email address .

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Linux or Open Source. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

Any article submitted to Linux Productivity Magazine must be licensed with the Open Publication License, which you can view at At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Linux Productivity Magazine, you must decline the option to prohibit commercial use, because Linux Productivity Magazine is a commercial publication.

Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

Copyright (c) 2001 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version  Draft v1.0, 8 June 1999 (Available at (wordwrapped for readability at The latest version is presently available at

Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.

After that paragraph, write the title, text of the article, and a two sentence description of the author.

Why not Draft v1.0, 8 June 1999 OR LATER

The Open Publication License recommends using the word "or later" to describe the version of the license. That is unacceptable for Troubleshooting Professional Magazine because we do not know the provisions of that newer version, so it makes no sense to commit to it. We all hope later versions will be better, but there's always a chance that leadership will change. We cannot take the chance that the disclaimer of warranty will be dropped in a later version.


All trademarks are the property of their respective owners. Troubleshooters.Com (R) is a registered trademark of Steve Litt.

URLs Mentioned in this Issue