Copyright (C) 2004 by Steve Litt. All rights reserved.
Materials from guest authors copyrighted by them and licensed for perpetual
use to Troubleshooting Professional Magazine. All rights reserved to the
copyright holder, except for items specifically marked otherwise (certain
free software source code, GNU/GPL, etc.). All material herein provided "As-Is".
User assumes all risk and responsibility for any outcome.
Volume 8 Issue 1, Winter,
| Back Issues | Linux Productivity Magazine ]
A stitch in time saves nine. -- Proverb
By Steve Litt
My first day as a technician at Pacific Stereo, my new boss read me the rules
of the road:
"Here's the most important thing Steve. Make sure you never, and I mean NEVER,
make a unit worse than when it came in. NEVER plug it in without first running
it up on a Variac. When measuring voltages, NEVER let your probe slip in a
way that can short out the unit in a serious way. NEVER make things worse!".
That was my introduction to Pacific Stereo. But more importantly, it was my
introduction to the step 2 of the Universal Troubleshooting Process: Make
a Damage Control Plan.
In my 3 year stint at Pacific Stereo, I saw high power receivers burst into
flame. On other technicians' benches. I saw turntables flung across the room
in utter frustration. But I was never the flinger. I heeded my first boss's
word, and made sure never to make things worse. Years later as a software
developer, I made sure never to jeopardize data unless it was backed up. Finally,
some 15 years after my first day at Pacific Stereo, I incorporated my first
boss's words into the Universal Troubleshooting Process as Step 2.
Better yet, about 8 years after that, a very astute customer made me aware
that damage control was more than not making things worse. It was also limiting
the damage caused by the defect to be troubleshot.
Today we'll discuss damage control as it relates to Troubleshooting. So kick
back, relax, and read this issue. And remember, if you're a Troubleshooter,
this is your magazine. Enjoy!
Don't Make Things Worse
By Steve Litt
Imagine the guy who decided to disable Chernobyl's automatic shutdown mechanisms
during a 1986 test. Might that have been a career limiting event? Who in their
right mind would disable a nuclear reactor's automatic shutdown mechanism
for any reason? There are always other ways to test things.
You and I are probably lucky enough not to work on equipment so dangerous
that a failure would be spoken of 17 years later, but screwups are still career
landmines. Everyone knows Troubleshooters can't instantly find the problem,
but they expect at least that you won't make it worse.
If you make it worse you have two unattractive choices:
None is attractive. That's why every troubleshooting adventure must begin
(after getting the right attitude) with formulation of a damage control plan.
At Chernobyl, your damage control plan would have included "never defeat safety
interlocks". Working on a car, it would include not placing your skin near
the carburetor, always having the oil pipe capped to prevent screws dropping
into the crankcase, and pinning up long hair before looking in an engine
- Admit it, and pay for the damage
- Admit it, and refuse to pay for the damage
- Fix the troubleshooter-caused damage on your own time, with your own
labor and money
Working on a computerized system, make sure to back up any data even remotely
A damage control plan is just that -- a plan. It's not an action. Taking specific
actions in every troubleshooting episode would be too time consuming for
most troubleshooting in non-safety-critical situations (consumer electronic
repair, data processing computer repair and the like). Instead, stop for a
moment and determine what risks your troubleshooting actions could trigger,
and how to limit those risks. It's as simple as determining a line over which
you won't step: "I won't edit the Windows Registry without backing up the
Here are some of the major risk categories to consider:
Making a damage control minimizes such risks.
- Injury to people
- Damage to the machine or system
- Loss of valuable data
- Loss of product in the production line
- Legal liability
Limit Damage from
the Defect Itself
By Steve Litt
Summer of 1989. You could still get "45" records, and if you bought one, it
just might be Good Thing by Fine Young Cannivals. You could
get 45 records, but one thing you couldn't easily get back then was an antivirus
program. Viruses were relatively new, and few IT people had seen them. Certainly
none of the IT people at my large law firm client had ever seen a virus.
Until that nice summer day in 1989.
One programmer said he thought we had a virus on several computers. I proved
his hyptothesis correct by demonstrating executable programs change themselves.
I told the MIS manager, and the effect was like a row of dominos. She panicked.
That was really scary, because she was one of the calmest, coolest, most rational
people I've met. So I panicked (and I knew better -- I was in the middle
of writing Troubleshooting: Tools, Tips and Techniques). Every programmer
panicked. One by one, we all fell down like a row of dominos.
Just then the CFO walked out of her office, saw our consternation, and asked
what was wrong. I told her, and her reply was swift:
"OK. First, unplug all the network junctions to keep this thing from spreading.
Then power down every machine in the place to keep it from spreading on the
machines. Then take an infected machine, study it, and find a way to fix this
thing". One of you go tell the central office about this problem to minimize
inconvenience to everyone else.
The effect was galvanic. Imagine a video tape of a falling row of dominos,
played in reverse. We each took a deep breath, said "yeah", got together,
made a plan, informed the central office, and got to work. Another programmer
and I analyzed several infected executables and managed to find a string occurring
in every one. We could now parse for infected files. I went ahead and created
a program that would scan an entire hard disk for files containing that string,
and report on them. My program would be run on a write protected bootable
floppy. A particularly skillful assembly language programmer analyzed the
virus, reverse engineered it, found it replicated itself by an undocumented
interrupt call (if memory serves me it was interrupt 35), and wrote a program
to act as a "birth control device" for the virus, so that no re-infection
would occur later on.
Our programs written, we hired about 30 people from a body shop, and disinfected
the hundreds of computers in the office. Three days after discovery of the
virus, we were up and running, and virus free.
We later found out the virus's name was Jerusalem B, a particularly evil virus
from the early days of DOS virii. It had originated in Jerusalem, Israel,
and when the University of Jerusalem had been infected, it took them longer
than 3 days to fix it. Our crew of about 5 programmers had worked miracles.
But imagine for a second what would have happened without the quick and sure
advice of the CFO. Had our panic and indecision continued much longer, upper
management would have had meetings. Blame might have been placed. Alternative
actions contemplated. Time wasted, possibly while the network was left up
to continue the spread. The network might have been down for a couple weeks.
Fast Action Prevents Further Damage
One monkey don't stop no show. Neither does one computer, or one network,
or one machine. Or several machines. Nothing stops the show. The show must
After dispensing with personal safety issues, the next priority is business
continuation. How can we gracefully shut down? How can we find alternate ways
of working while this problem is being solved? How can we inform everyone,
and give everyone ways to work around the problem until it's fixed? These
questions must be answered quickly.
The State of Troubleshooting
By Steve Litt
The old Chinese curse states "may you live in interesting times". We are
Outsourcing, Offshoring and Recession (OOR) have decimated the ranks of technologists.
Many engineers and computer programmers, especially older ones, are teaching
school, driving cabs and even flipping burgers. Americans take the layoffs,
and foreign workers get many of the newly created jobs. Where does this leave
That depends on how well the expert Troubleshooter markets that skill. I
predict that all too soon this love affair with cheap made-in-India software
systems will sour. Outsourcing, whether across the street or across the Pacific,
is difficult. The software systems are cheaper, but it remains to be seen
whether they'll be cheap enough to compensate for the hassle.
Meanwhile, small business employs over half the U.S. workforce, and is responsible
for most of the new job creation. Small business isn't big enough to offshore
a huge app. Small business will be serviced by generalist computer expert
freelancers. These freelancers must deliver more for less -- not just with
respect to software creation and network configuration, but also with respect
Troubleshooting is on the short list of skills vital to keep you employed.
Outside of information technology, things might be even better. There are
more cars than ever, and cars break. Excellent car repair places, both large
and small, continue to thrive. Today's complexity is countered by specialization,
or by training.
If the recession continues, purchase of new items will falter, meaning repair
of old items will become more attactive. The Troubleshooter stands to gain.
These are tough times for all but the richest, with all too many middle class
people drowning in the flood of outsourcing, offshoring and recession. Especially
in this economy, Troubleshooting Process knowledge serves as a floatation
And remember -- no recession lasts forever. When good times return, great
Troubleshooters will be scooped up fast.
Letters to the Editor
All letters become the property of the publisher (Steve Litt), and may
be edited for clarity or brevity. We especially welcome additions, clarifications,
corrections or flames from vendors whose products have been reviewed in this
magazine. We reserve the right to not publish letters we deem in bad taste
(bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure
the subject reads "Letter to the Editor". We regret that we cannot return
your letter, so please make a copy of it for future reference.
How to Submit an Article
We anticipate two to five articles per issue, with issues coming out monthly.
We look for articles that pertain to the Troubleshooting Process, or articles
on tools, equipment or systems with a Troubleshooting slant. This can be
done as an essay, with humor, with a case study, or some other literary device.
A Troubleshooting poem would be nice. Submissions may mention a specific product,
but must be useful without the purchase of that product. Content must greatly
overpower advertising. Submissions should be between 250 and 2000 words long.
Any article submitted to Troubleshooting Professional Magazine must be
licensed with the Open Publication License, which you can view at http://opencontent.org/openpub/.
At your option you may elect the option to prohibit substantive modifications.
However, in order to publish your article in Troubleshooting Professional
Magazine, you must decline the option to prohibit commercial use, because
Troubleshooting Professional Magazine is a commercial publication.
Obviously, you must be the copyright holder and must be legally able to
so license the article. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity
or brevity, within the scope of the Open Publication License. If you elect
to prohibit substantive modifications, we may elect to place editors notes
outside of your material, or reject the submission, or send it back for modification.
Any published article will include a two sentence description of the author,
a hypertext link to his or her email, and a phone number if desired. Upon
request, we will include a hypertext link, at the end of the magazine issue,
to the author's website, providing that website meets the Troubleshooters.Com
criteria for links and that the author's
website first links to Troubleshooters.Com. Authors: please understand we
can't place hyperlinks inside articles. If we did, only the first article
would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with subject
line Article Submission. The first paragraph of your message should read
as follows (unless other arrangements are previously made in writing):
Copyright (c) 2001 by <your name>. This material
may be distributed only subject to the terms and conditions set forth in
the Open Publication License, version Draft v1.0, 8 June 1999 (Available
at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability
at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version
is presently available at http://www.opencontent.org/openpub/).
Open Publication License Option A [ is | is not] elected,
so this document [may | may not] be modified. Option B is not elected, so
this material may be published for commercial purposes.
After that paragraph, write the title, text of the article, and a two
sentence description of the author.
Why not Draft v1.0, 8 June 1999 OR LATER
The Open Publication License recommends using the word "or later" to describe
the version of the license. That is unacceptable for Troubleshooting Professional
Magazine because we do not know the provisions of that newer version, so
it makes no sense to commit to it. We all hope later versions will be better,
but there's always a chance that leadership will change. We cannot take the
chance that the disclaimer of warranty will be dropped in a later version.
All trademarks are the property of their respective owners. Troubleshooters.Com
(R) is a registered trademark of Steve Litt.
URLs Mentioned in this Issue