Troubleshooters.Com
Presents
Troubleshooting
Professional
Magazine
Volume 7 Issue
3, Summer
2003
Speculation, Guesswork
and
Prayer
|
Copyright (C) 2001-2005 by Steve Litt. All rights
reserved. This material was started in 2001, and completed in 2005.
Materials from guest authors copyrighted by them and licensed for
perpetual
use to Troubleshooting Professional Magazine. All rights reserved to
the
copyright holder, except for items specifically marked otherwise
(certain
free software source code, GNU/GPL, etc.). All material herein provided
"As-Is".
User assumes all risk and responsibility for any outcome.
[ Troubleshooters.Com
| Back Issues ]
Arrogance rides
triumphantly through the gates, barely glancing at the old woman
about to cut the rope and spring shut the
trap. -- Mason Cooley
|
CONTENTS
Editors Desk
By Steve Litt
Riddle: The answer is
"speculation, guesswork and prayer". What's the
question?
The question is: What is troubleshooting without process?
Absent valid troubleshooting process, troubleshooting becomes
speculation
in the hands of experts, guesswork for those of average abilities, and
prayer
in the hands of neophytes.
Speculation, guesswork and prayer. Loss of profit, morale, and
reputation
(for the company, department, and employee).
This issue of Troubleshooting Professional discusses the importance
of
Troubleshooting Process, and the speculation, guesswork and prayer
resulting
from its absence. We'll discuss how lack of process creates
speculation,
guesswork and prayer, how to recognize the problem, the resulting
hardships
to your department or organization, and how you can eliminate the
problem
with training in troubleshooting process.
So kick back, relax, and read this magazine. And remember, if you're
a
Troubleshooter or Technologist, this is your magazine. Enjoy!
Yes, as a Matter of Fact
This Magazine is Late
By Steve Litt
The Summer 2003 issue of Troubleshooting Professional Magazine was
completed on October 6, 2005. It had been on the web before that,
presumably since the summer of 2003, but it had been incomplete.
Stranger still, this magazine was written primarily in 2001, but then
put aside for other content. When I rediscovered this content on
10/6/2005, I was very impressed with it, finished it, and put the
revisions on the web. I hope you enjoy it, and just remember this:
Speculation
By Steve Litt
Give a symptom description to 12 experts in a room. Ideally you'd like
to
see immediate, unanimous agreement on the root cause of the symptom.
But of
course that's impossible.
A very realistic expectation is to hear each expert say "I don't
know
but I'll find out". But that's all too rare.
What you usually find when giving a symptom description to 12
experts
is 12 different snap judgements. Speculation. This poses problem
because at the most one of those 12 snap decisions is correct. The
other 11 embark on an
expensive journey to a dead end. The result for your department or
organization
is slow solutions, failure to repair, or other problems.
Somehow society, and experts themselves, have developed an
expectation
that a true expert can hear a symptom description and instantly fathom
the
cause. To see how truly silly this is, imagine if court procedings were
conducted
with this expectation.
Effective Troubleshooting is Like a Courtroom
Who done it? What caused it? Same thing.
In court you need to prove the cause beyond a reasonable doubt.
Until
that proof is concluded, any remediation is likely to do more harm than
good.
No matter how expert the detective, the lawyer, the expert witness, it
remains
the duty of the jury to weigh all evidence.
In the courtroom, failure to prove before remediating allows the
guilty
to go free, and the innocent to be imprisoned or executed. It results
in
loss of confidence by the public. To see examples, Google search the
combination
"death row" "Anthony Porter".
In the repair of machines, computers, networks and other systems,
failure
to prove before remediating facilitates repeated failure to repair,
further
damage to the system, and loss of confidence, in both the department
and
organization, by customers, management and co-workers.
Both Require a Process
To minimize likelihood of harmful actions, both courtroom activities
and
troubleshooting require a process.
In the case of the courtroom, the chain of evidence must be
preserved.
Both sides present witnesses, examine and cross examine the witnesses,
present
exhibits, have witnesses testify about exhibits, make and counter
objections.
The judge determines at each point whether valid courtroom process is
being
followed. The jury is informed they must decide based ONLY on evidence
that
has been properly and correctly presented in court. The jury reaches
its
best decision, based ONLY on the evidence.
In the case of troubleshooting, the technologist aquires the symptom
description,
reproduces the symptom, checks likely suspects, and performs tests.
Each test
designed to rule out sections of the system. Eventually the area of
inquiry
narrows to the point where the root cause is obvious, at which time the
technologist
has reached his best decision based only on the evidence, and repairs
or
replaces the defective component.
But What About Intuition
We all know troubleshooters use intuition, and we all believe this is a
good
thing. The question is, at which point is intuition brought to
bear?
Intuition is a wonderful tool in deciding what tests to perform. It's a
major
component in guiding the course of the diagnosis. But when the
techologist
assumes a cause based on intuition, and acts on that assumption, he's
crossed
the line.
The process leading to the solution should be more deductive than
intuitive.
Unless the technologist performs verification tests before performing
an intuition-based
fix, he will waste an extraordinary amount of time and money
implementing
a wrong fix. If the technologist fails to test, the fix will be wrong,
resulting
in serious problems.
Not only is intuition harmful at the root cause level, but excessive
use
can also be harmful at the diagnostic level. If the technologist
repeatedly
speculates specific components, he loses the benefit of ruling out
large
chunks of the system.
The decision of which troubleshooting tests to perform next is based
on
a quadruple tradeoff:
- Ease of performing the test
- Likelihood that the test will isolate the root cause into a small
area
- Even divisions in ruling out the remainder of the system
- Safety concerns
Intuition and speculation are proper to the extent that they are used
to
estimate the preceding four factors and draw the best educated guess.
Beyond
that, intuition and speculation have no part in troubleshooting.
Talent is No Excuse for Speculation
The argument might be made that the speculation of an expert is more
valid
than that of a non-expert. Such an argument might state that whereas
the juror
is just a "peer" of the defendent, the network engineer is an expert
having
spent a career accumulating network knowledge. Certainly the network
engineer's
speculation can be respected.
But in fact, the network engineer's speculation is no more
respectable
than a juror's would be. The most talented and knowledgeable network
engineer
lacks Xray vision and divining rod fingers. He cannot listen to a
symptom
description, gaze upon the system, and devine the single defective
components
out of a cast of hundreds or thousands.
Yet many talented people try for the speculative quick fix. They use
a
diagnostic machine or software program, speculate that the diagnostic's
recommendations
are true, forsaking all other areas of the machine. Or they hear a
symptom
description identical to one they've heard before, and speculate that
the
root cause is the same. Or they analyse the system, and speculate on
which
part would produce an identical symptom.
Once again, there's nothing wrong with using speculation as a tool
to
decide among several likely diagnostic tests, but when one assumes a
root
cause, the result is replacement of the wrong component, circular
troubleshooting,
argumentative troubleshooting, finger pointing, profit loss, and harm
to the
reputation of all involved.
Speculation is a Product of Ego
Certainly a technologist considering himself a beginner or "average"
would
have no problem "admitting" the need of diagnostic tests before drawing
a
conclusion. It seems like those considering themselves "experts" are
the ones
expecting themselves to provide an instant answer. Yet a little
reflection tells us that no sane person
would expect any human to conjur a root cause out of a system with
thousands
of components. The sane person knows that only the process of
elimination
leads to a quick and correct identification of the root cause.
So why do those considering themselves experts fall into the
speculation
trap? They fail to recognize that troubleshooting consists of two
distinct
strengths:
- Knowledge of the system under scrutiny
- Knowledge of troubleshooting process
The reason they don't know is because they've never been trained in #2.
Very
few people have. As long as we expect our experts to get along without
training
in valid troubleshooting process, we'll continue to experience
speculation
and the problems it brings.
Some Experts Use Valid Troubleshooting Process
Nothing in this article is meant to imply that all experts are devoid
in
troubleshooting process knowledge. Many experts have learned a valid
troubleshooting
process either through courses or throught the school of hard knocks.
Such expert troubleshooters usually do not speculate nor exhibit
arrogance.
Problems Caused by Speculative Troubleshooting
The problems caused by speculative troubleshooting can be grouped into
three
categories:
- Individual problems
- Team problems
- Corporate problems.
Individual Problems
The individual technologist troubleshooting speculatively experiences
productivity
loss caused by the circular troubleshooting forced upon him by his
speculation.
This results in delayed solutions at best, and incorrect solutions if
speculations
are not verified with conclusive tests. Such productivity loss cannot
help
but torpedo the technologist's morale, which could result in burnout,
stress,
or blame shifting. Burnout, stress and blame shifting invariably steal
precious
energy from the technologist's diagnostic efforts, creating a vicious
circle.
The final result could be disability leave, dismissal, or gravely
impaired
performance.
Team Problems
No troubleshooter works alone. There is usually a customer, internal or
external,
who expects the problem fixed quickly and cleanly. In the case of a
programmer
debugging his own code, there will be users relying on that code, and
management
who have gone out on a limb promising clean code delivered on time.
Often
the troubleshooter is part of a team consisting of a members from one
or more
departments.
Speculative troubleshooting ruins teamwork. It facilitates argument
and
finger pointing between employees, or with customers or vendors. The
dispute
can take the form of "who owns the problem", as in the classic "it's
hardware/no
it's software" gambit, or it can take a more personal form as in "Joe
gave
me bad information (his erroneous speculation), and cost me many hours
of
time scheduled for another project.
Combined with a lack of team awareness (good Troubleshooting Process
courses
teach team troubleshooting), speculation results in alternating
technologist
visits. The hardware guy comes out, pronounces the problem software,
and leaves.
The software guy comes out, declares it to be hardware, and leaves.
This
goes on three or four times, consuming a week or more, until
heavy-hitter
managers demand that both the hardware and software people show up at
the
same time and stay there til the problem is solved.
All of this creates morale problems for the team. Team morale
problems
tend to degenerate very fast and very hard, with grave results.
Corporate Problems
Morale problems also crop up in the corporation. Speculative
troubleshooters
leave users and customers high and dry. Name calling contests often
ensue.
All of this reduces the organization's productivity.
The decreased individual and team performance results in increased
technologist
salary expense, as more technologists are required to do the work.
The finger pointing between technologists, departments, and
companies,
caused by speculative troubleshooting can make the organization look
like
a Keystone Kops film.
Opportunities are lost as lingering unsolved problems erode client
confidence.
Those lingering problems also create risk of customer loss, litigation,
morale
problems, and harm to the organization's reputation.
All these "screwups" result in remediation costs. Drawn out meetings
and
phone calls are required to sooth nerves (usually followed by yet
another
mistake). Profit-sapping customer discounts are often used to say "I'm
sorry".
In the case of catastrophic mistakes, remedial advertising might be
required.
Recognizing Troubleshooting by Speculation
Sometimes the hardest part of dealing with a problem is recognizing it
in
the first place. How do you recognize speculative troubleshooting.
To
best recognize speculative troubleshooting, be on the lookout for the
following
symptoms:
- Arrogance based diagnosis
- Agenda based diagnosis
- Rote repetitive diagnosis
- Non-cooperation
- Slow or unsatisfactory solutions
- Dissatisfaction with the support department
Arrogance Based Diagnosis
Arrogance based diagnosis leaves clues. When you see these clues,
investigate.
"I'm the expert, I know how to fix it!"
Statements like the preceding should raise a red flag. Although it's
true
that every troubleshooter must occasionally trot out this phrase to
calm
an overly agitated user, if someone uses this phrase regularly it's
probably
more out of arrogance than an attempt to calm. All too often statements
like
the preceding are followed by misdiagnosis.
If the person making the statement has anything but a steller record
of
fast, accurate solutions, intervention may be called for.
"The diagnostic software says..."
Diagnostic tools and software are essential for diagnostic
productivity.
Unfortunately, the more complex and featureful they are, the more they
can
be misused. Many diagnostic tools attempt to troubleshoot down to the
defective
component on the basis of a symptom description, possibly plus an
inadequate
array of pre-defined diagnostic tests. Many so-called experts are more
expert
in the operation of their diagnostic tools than on the system under
repair
or the process of troubleshooting. All too often, experts place
complete
belief in the pronouncements of such diagnostic tools, leading to
replacement
of incorrect components and failure to repair.
When a technologist says "The diagnostic software says...", if the
next
sentence is "So lets start looking at that subsystem", everything's OK.
But
if the next sentence is "So let's replace the...", expect trouble. It's
the
height of arrogance to believe ones self so expert at diagnostic
equipment
that further investigation is unnecesary.
Again, if the person making the statement has anything but a steller
record
of fast, accurate solutions, intervention may be called for.
Foot dragging
Maddeningly, some technologists consider themselves so important that
they
simply fail to respond. Sometimes it's the result of overzealous
scheduling
of their resources by management. But in many cases, it's arrogance,
plain
and simple.
Although foot dragging is not caused by speculation, often foot
dragging
and speculation have the same cause -- arrogance.
Agenda Based Diagnosis
Sometimes speculation takes the form of conflict of interest. Maybe the
technologist
doesn't want to be bothered right now, so he manufactures a believable
explanation
why the root cause is in a subsystem whose responsible party is a
different
person or organization. The old "It's a hardware problem" gambit.
Often the diagnosis is made in order to shift blame. Perhaps the
techologist
wired the building with inferior network cabling, and now is trying to
blame
the poor performance on "bad routers" and "packet collisions" rather
than
admit he made a bad mistake.
Sometimes there's a more sinister intent. Two departments get in a
spitting
contest, and technologists from each department fabricates root cause
speculations
designed to place the cause in the enemy department.
All agenda based diagnosis share's one common trait: Root causes
don't
move according to the whims of people. Eventually the scoundrels are
found
out, and people are angered. If the duped party is a customer, he might
take
his business elsewhere. If the duped party is inside the organization,
the
entire organization has been hurt by the slow diagnosis -- a fact not
unnoticed
by upper management.
Good troubleshooting process training educates technologists to the
fact
that sooner or later they will lose, and lose big, if they diagnose
according
to agenda.
Rote Repetitive Diagnosis
Some experts are more memory experts than system experts or
troubleshooting
process experts. An outstanding memory is an asset when used wisely and
a
liability when used as a crutch. Once again, if memory serves to
identify
the diagnostic test most likely to isolate the root cause, that's a
good
thing. But if memory is used to speculate the root cause, and that
speculation
is used as the basis of a repair, many times it's a costly mistake.
We see it all the time. Last time the Windows crash was caused by a
problem
with the database server, and you just got a blue screen, so
we'll
reload the database server now. Unfortunately, this time it's a rogue .dll
file placed on the application server by a restore, so
reloading the
database server just dumps all work past the last backup and costs many
hours.
There are some who believe expertise is achieved when one has seen
many
problems and remembers the solutions. Such people don't understand the
role
of troubleshooting process, and doom themselves to a high percentage of
bogus
repairs. Such people need training in the troubleshooting process.
Non-cooperation
Non-cooperation is really an effect of diagnosis by arrogance or
diagnosis
by agenda. If you see much non-cooperation during troubleshooting,
investigate
further.
Slow or Unsatisfactory Solutions or Dissatisfaction with the
Support Department
The ultimate result of speculative troubleshooting is slow or
unsatisfactory
solutions. If you see these symptoms, investigate further.
Eliminating Troubleshooting by Speculation
Obviously, to eliminate speculative troubleshooting, you need to fix
the
cause. As mentioned previously, possible causes include:
- Arrogance
- Rote repetitive diagnosis
- Agenda
Arrogance
One could define arrogance as unjustified certainty. The arrogant
technolgist
is certain his speculation is correct and there's no need for further
research
or testing. In the absense of some sort of extrasensory powers, how
could
he draw that conclusion?
It could be abject stupidity, but if he's smart enough to be
considered
an expert, that's unlikely. What's more likely is that he is not
familiar
with the role of process in troubleshooting. This is likely
because only a tiny percentage of technologists have been trained
in troubleshooting process, so the troubleshooting productivity of this
"expert"
might be on a par with his contemporaries. Quite often, lack of
troubleshooting
process training is the root cause of arrogance, which is a cause of
speculative
troubleshooting. Therefore, troubleshooting process training can
correct the
situation.
Rote repetitive diagnosis
What causes rote repetitive diagnosis? Certainly its practitioners
know
it fails frequently, so why do they use it overwhelmingly? Could it be
they
know of no alternative? Once again, given the lack of training in the
troubleshooting
process, it's quite likely.
So in cases of speculation caused by arrogance or rote repetitive
diagnosis,
training in a valid and correctly optimized troubleshooting process
will
likely cure the speculation, and the host of problems it creates.
Agenda
Diagnosis by agenda is not so simple. Here there's pressure on the
technologist
to toss the problem to the other department or company like a hot
potato.
Here the speculation is merely a symptom, with the root cause being a
defective
business model pitting departments against each other, or occasionally
the
company against its customers. Does that mean training in
troubleshooting
process cannot help?
Yes and no. The process training won't eliminate the root cause of
agenda
based diagnosis, but it can expose the root cause. Once one department
understands
the process of diagnosis, they are prepared to find all root causes
within
their area of responsibility, and they're prepared to document cases
where
the other department hot potato's them, then armed with such evidence,
that department can report their findings to management. It's possible
that management will understand and fix the underlying business
problem. If nothing else, the troubleshooting process training makes
the day to day
work environment more livable.
Summary
If speculative troubleshooting creates problems in your organization,
there's a high likelihood that problem can be cured or at least
ameliorated
by training your technologists in diagnostic process.
Guesswork
By Steve Litt
What's the distinction between speculation and guesswork? Speculation
involves
certainty, while guesswork does not.
Those defined as "experts" are often self-defined. They're certain
of their
expertise. Those without such certainty might be relegated to the
"average"
designation.
Wouldn't a self-perceived "average" troubleshooter forgo speculation
(or
its less certain cousin called guesswork) in favor of a valid
troubleshooting
process? He would if only he knew a valid troubleshooting process.
Once again, only a tiny segment of the technology population
receives
training in the process of diagnosis. The rest have to learn it in "the
school
of hard knocks". Some learn it better than others. It's not an
either/or
proposition, but instead a spectrum from the process clueless to those
totally
at home with and aware of process, cause and effect, the process of
elimination,
and prioritization of diagnostic testing.
We all guess from time to time when deciding what diagnostic test to
take
next. But all too often guessing transends the role of tactical guide,
and
becomes a strategy in and of itself. That's the time to give the
employee
an alternative -- training in a valid troubleshooting process.
Those with less than total comfort and awareness of troubleshooting
process
often resort to guesswork. Sometimes the guesswork takes the form of
analyzing
the machine and guessing which root causes would cause the symptom.
Sometimes
the guesswork is along the lines of probability. Guesswork can assume
the
role of total belief of "expert systems", or of diagnosis by serial
replacement,
or excessive reliance on escalation. The process-deficient technologist
often
guesses that the root cause is outside his job description, or that he
has
insufficient skills to fix the problem (that's almost always a false
assumption).
And sometimes guesswork is just random guessing. Whatever the
appearance of
the guesswork, it's always a productivity thief.
If these symptoms appear among your technologists, it's necessary to
recognize
that diagnosis by guesswork is occurring, and it's time for your
technologists
to receive training in a valid troubleshooting process.
Prayer
By Steve Litt
While the experts are busy with speculation, and average technologists
slowly
guess their way to a solution, the beginning technolgist must pray for
a miracle.
Like the others, he's received no troubleshooting process training, but
in
addition, he hasn't had sufficient time in "the school of hard knocks"
to
learn even a little process. Making things worse, the beginning
technologist
is often undertrained in the system under repair.
Barring a miracle or massive intervention by a more experienced
technologist,
the beginning technologist stands little chance of solving a complex
technology
problem. And given today's "lean and mean" business environment,
there's
little hope of much help from senior technologists. So the beginning
technologist
muddles through his first year or two trying not to alienate coworkers,
bosses
and customers. He tries not to cause damage to systems he
troubleshoots.
He fights job disappointment.
In the kinder and gentler days of decades gone by, the organization
had
resources to help the new technologist through these difficult times.
Now
many organizations expect the new technologist to sink or swim. The
associated
turnover is a problem, but they just don't have enough spare capacity
when
it comes to more experienced technologists.
It doesn't have to be this way
There's an alternative to the sink or swim philosophy. New
technologists
may be several months deficient in their systems training, but a two
day
course can make them fully competant in troubleshooting process.
As any master technician can tell you, an excellent grasp of
troubleshooting
process compensates quite nicely for a partial lack of systems
expertise.
In fact, the cause and effect deduced during the troubleshooting
process
is one way we learn technological systems.
Ultimately, the organization that brings new technologists up to
speed
fast experiences productivity gains, reduced staffing, retention and
employee
acquisition costs, and a better team environment. Two days of
troubleshooting
process training is a small price to pay for these advantages.
Salvation
By Steve Litt
The bad news is that speculation, guesswork and prayer carry huge costs
for
the organization, the department, and the individual employee. The good
news
is that because they're caused by inadequate troubleshooting process
knowledge,
they're cured by a simple training course. Of course, you need to
choose the
right troubleshooting process, the right course, and the right
trainer(s).
The following are the properties of a well chosen troubleshooting
process
and training:
- Valid Troubleshooting Process
- Properly Optimized Troubleshooting Process
- Good for expert, average and beginning
- Yields significant productivity gains
This article also discusses the finer points of evaluating a
troubleshooting
process and implementing a troubleshooting process training program.
Valid Troubleshooting Process
A valid troubleshooting process must, at the very least, accomplish the
following:
- Recognize and exploit cause and effect
- Recognize and exploit the process of elimination
- Minimize jargon and fluff
- Accommodate the Work Habits of Human Beings
Recognize and Exploit Cause and Effect
Cause and effect is the opposite of cause and effect is superstition.
Nobody
wants diagnosis by tarot card.
But it's not enough not to be superstitious. Cause and effect must
be
recognized and exploited. Many so-called "troubleshooting courses" are
nothing
but yet another journey into the technical details of the system to be
serviced,
with a few diagnostic tools thrown in. No cause and effect there.
Recognize and Exploit the Process of Elimination
If you count up all the electronic parts, jumper settings, BIOS
settings,
operating system configuration parameters, and application
configuration parameters
in a modern computer system, you'll find it has 50,000 or so components
that
can act as root causes. Combine 10, 100 or 1000 such computers into a
network,
and the complexity is mind boggling. Nothing short of efficient
use
of the process of elimination will foster quick solutions of such
systems.
And yet there are all sorts of "Windows Troubleshooting" and
"Computer
Troubleshooting" courses that act as little more than symptom/solution
listings,
with a few tests and diagnostic products thrown in.
Accommodate the Work Habits of Human Beings
Human beings have certain attributes. To the extent that a
troubleshooting
process works with those attributes, it will be successful. To the
extent
that a troubleshooting process fights those attributes, it will fail.
First and foremost, humans can concentrate on about 7 facts at a
time.
Some more, some less, but 7 is a good number. Right off the bat, that
means
that the process better not ask the person to try to simulate the
machine
in an effort to "figure out" what's wrong. Nobody can contemplate
hundreds
or thousands of components at once. That's why mental simulation
troubleshooting
doesn't work. That's why processes based on binary search through
diagnostic
tests work marvelously.
Humans must trust before they can invest. A simple process whose
value
is obvious will excite a person to learn, and to use after learning. On
the
other hand, nobody will go to the effort to learn a process requiring
all sorts of
detailed actions on his part. Initially the trainee doesn't believe it
will
work, and as such doesn't invest much learning energy.
That brings up the subject of the program of the month.
Anyone
in the workforce more than a couple years has encountered at least one.
When an
employee perceives something as a program of the month, he goes through
the
motions but his mind is elsewhere. His attitude: Been there, done that!
The troubleshooting process MUST NOT be perceived as a program of
the
month. What are some attributes of programs of the month?
- Upper management never bought in
- Insults the intelligence of employees
- Treasure hunts
- Reciting "we're entrapeneurial" while on salary during a salary
freeze
- Warm fuzzy cartoons with animals, carrots and sticks, extolling
the
virtues of teamwork
- Implying that the employees best interests always parallel the
employer's
best interests
- Encrusted with jargon
So your troubleshooting process of choice should state its case in
plain
English, it should take care not to insult the intelligence of those
being trained,
and like any other company endeavor, it must have the support of
management
at the highest level of its application and influence.
Properly Optimized Troubleshooting Process
Problem solving process isn't a "one size fits all" proposition. There
are
generic problem solving methodologies optimized to solve problems in
fuzzily
defined systems (business, political and relationship), there are
troubleshooting
processes optimized to solve problems in well defined systems
(computers,
networks, software and machines of all sorts). There are methodologies
specialized
for safety critical situations, and for events and extremely sparse
intermittents.
There are methodologies optimized to find bottlenecks (where the
symptom description
contains "too" or "insufficient"). Selecting a process not optimized
for
your situation will cut productivity by an order of magnitude.
The Universal Troubleshooting Process is an example of a
troubleshooting
process optimized for well defined systems, and works quite well under
a wide
variety of such systems. Its ideal use is on reproducible or frequently
recurring
intermittent problems in machines and computer systems (including
software)
in low, moderate and sometimes highly safety critical situations. In
highly safety
critical situations (nuclear power plants and the like) it's best
supplemented
with a more safety optimized process such as Root Cause Analysis.
Because it incorporates Bottleneck Analysis, the Universal
Troubleshooting
Process quickly solves problems of degree in well defined systems.
Inappropriate Uses of the Universal Troubleshooting Process
The Universal Troubleshooting Process is inappropriate for solving
problems
in fuzzily defined systems such as businesses, relationships, and
factory
floors (the work and parts flow). For such problems you'd use a generic
problem
solving method like Kepner Tregoe, or as appropriate a bottleneck
analysis
optimized process such as the Theory of Constraints. The Theory of
Constraints
has become quite famous on the factory floor.
The Universal Troubleshooting Process is insufficient in
environments
where a large number of events or sparse intermittents occur. Such
environments
require use of an event-optimized process such as Root Cause Analysis.
As
mentioned previously, in extreme safety critical environments such as
nuclear
power plants the Universal Troubleshooting Process should be
supplemented
with something like Root Cause Analysis.
And likewise...
There will come a time (for many of you it's already come many times)
when
someone will try to sell you on the idea of generic problem solving
training
for your technologists, technicians and maintenance people in order to
improve
their productivity fixing computerized systems and machines. Such use
of generic
problem solving process is inappropriate, time consuming, error prone, and costly because generic
problem
solving processes are not optimized for well defined problems, and also
because
they generally waste time with questions irrelevant to fixing technical problems, such as how to transition from the current state to the desired state: In technical troubleshooting you just repair or replace the bad component. Using a generic problem solving method to solve technological problems can double, triple, or perhaps increase tenfold the time, effort and cost of repair. And your employees will rebel against such use of generic problem solving methods.
Good for expert, average and beginning
Many experts speculate, many average technologists guess, and many
beginners
pray. The solution must be a troubleshooting process useful to expert,
average
and beginner alike. It must work, it must be comprehendible, and it
must be
believeable. It must not wrap itself in buzzword fluff or fuzzy
concepts.
Yields significant productivity gains
Speculation, guesswork and prayer waste time and money. The whole
purpose
of their elimination is increased productivity. The chosen
troubleshooting
process must truly increase productivity. Fortunately, this is easy,
because
usually the missing ingredient to productivity is troubleshooting
process.
Evaluating a Troubleshooting Process
How do you evaluate a troubleshooting process? How do you estimate its
productivity
potential? Here's a list of ideas:
- Optimized for speed and accuracy
- No extraneous time consuming frills
- Heavy coverage of the process
- Minimal jargon
- Respect for Troubleshooter's time and intelligence
- Immediately usable on the job
Optimized for speed and accuracy
In a high school electronics class years ago, I was taught to scope the
input,
then the first stage, then the second, stage by stage until the output.
The
first place I found a problem indicated the stage containing the root
cause.
That's called serial search, -- a genuine troubleshooting process. It
might have even been sufficient for the five tube radios of the era.
Today's systems often have six
figures
worth of components. Serial search would take weeks or months per
system.
Serial search isn't optimized for complex systems.
An optimal process for today's systems would use binary search. The
drawing
to the right shows how binary search works. Each diagnostic test rules
out
half of the remaining root cause scope. Obviously, real world
diagnostic tests
can't exactly split the remaining root cause scope, but it's something
we
must shoot for.
Imagine a system with 16 million components. If each diagnostic test
took
only one minute, serial search requires 16 million minutes -- that's 30
years.
Now let's say you use pure binary search. You would need only 24
diagnostic
tests to find the root cause. Even if each diagnostic took an hour
instead
of a minute, that's three 8 hour workdays.
|
|

|
|
Binary Search
|
|
One factor that hugely speeds diagnosis by even division is the
availability
of numerous cheap and quick testpoints. With such testpoints, a
division
is only a measurement away. Without them, much mental simulation is
required.
The Universal Troubleshooting Process is optimized to take
advantage of the numerous cheap and easy testpoints on most technology.
On
the other hand, the methods described in Kepner and Tregoe's "New
Rational
Manager" do not optimize to this advantage, resulting in a slower
methodology
on most technology.
No extraneous time consuming frills
For technological troubleshooting, you want a troubleshooting process
optimized
to technology, with no time consuming frills. Specifically, methods to
solve
fuzzily defined problems (business and interpersonal) are unneeded
extras
in technology. If a technologist needs to solve business and
interpersonal
problems as well as technology problems, he or she should be trained in
the
best of breed for each, not a methodology that happens to include both.
Minimal jargon
Jargon is distracting and aggravating. Employees are smart, so they
know
when a course author has used jargon words to obfuscate a beautifully
simple
concept. They typically attribute the motivation for such obfuscation
to
an attempt to make the material more complex than it need be, in order
to inflate
the material's price. Technological troubleshooting is simple enough to
be described
in plain English. When creating the Universal Troubleshooting Process
courseware,
I used plain English for every concept except the term Mental Model, which is
explainable in 2 minutes.
Respect for Troubleshooter's time and intelligence
All too many employees are subjected to various programs
supposed to "get them on board", or "improve their potential".
Unfortunately,
most are perceived as "bullfeathers". To be credible, troubleshooting
process
training must stick to the facts, and avoid any hint of "program of the
month"
type material.
Immediately usable on the job
Timing is everything. Unless the employee can immediately use the
training
on the job, the material is forgotten within a few days. Employees
should
be encouraged to use the information immediately upon return to work,
and
that encouragement should be based on the fact that the material is
useful
to their everyday jobs.
Implementing a TP training program
Decide on the goals
What is the basic problem? Is it failure of groups of employess to
solve
problems in a timely and accurate way? If so, are the problems they're
solving
business problems, factory floor problems, or technological problems
(machine
and system repair)?
If your employees need help solving business problems, consider a
course
employing the methods described in "The New Rational Manager" by Kepner
and
Tregoe. That methodology is great for analyzing and deciding upon
solutions
for fuzzily defined systems such as businesses, departments, and the
like.
However, because it doesn't take advantage of numerous cheap and safe
testpoints
afforded by machines and systems, it's slow and cumbersome for solving
technological
problems.
If your employees need help solving technological problems, be it
fixing
machines on the factory floor, local or wide area networks, computer
hardware,
computer software, or electronics, consider the Universal
Troubleshooting
Process course. Its optimization for abundant testpoints makes it a
lightning
quick way to diagnose and fix tech problems.
But resist the temptation to use the Universal Troubleshooting Process
for
business and interpersonal problems -- lack of testpoints in those
types
of systems make the Universal Troubleshooting Process insufficient for
business
and interpersonal problems.
When dealing with safety critical problems, you need to solve the
business
problem behind the technological problem. For instance, in a nuclear
reactor,
if lack of a written maintenance policy for air conditioners causes an
air
conditioner to run out of freon, which causes 130 degree temperatures
in
the parts storage room, which causes weak solder joints on stored
circuit
boards, which causes those boards to fail early, which causes a rod
mechanism
failure, which causes the reactor to trip, root cause must be traced
all
the way back to the lack of written maintenance policy. In such a case,
Root
Cause Analysis as described by Max Ammerman's book would be what is
needed.
In many cases your people need instruction on more than one of these
methodologies.
In that case, I suggest you give them courses in all necessary
methodologies.
Resist the temptation to make one of the technologies fit all the types
of
problems.
Commit the resources
Resources are always an open question. Here are some ideas.
Instruction costs
Instruction costs vary widely. One of the least expensive is the
Universal
Troubleshooting Process, which, if implemented by in-house instructors,
costs
$45.00 per attendee plus what you pay the in-house instructors. If you
choose
to have a Troubleshooters.Com instructor teach the course, it will
typically
cost between $2800.00 and $7000.00 for a 2 day course, depending on
location.
This can be very cost effective if you have more than 10 attendees at
the
course.
Many other courses cost considerably more. When deciding on a course,
take
into account the cost, the benefit, and how applicable the training is
to
the problems you're trying to solve. Reserve the money from the budget.
Employee downtime
The employees can't work while they're receiving instruction, and they
can't
receive instruction while they're working. As with any course, you must
commit
enough resources so employees can take the course. This might involve
having
employees take the course in shifts. However you work it, have it
worked out
in advance.
Post-course break in period
Commit the resources for a post-course break in period. In other words,
for
a period of a week or so, expect employee productivity to go down, not
up.
Why's that?
Because your employees have just changed their habits. What they once
did
by rote, they now need to think about. That takes time. I've heard it
said
that it takes 21 days to change a habit. That sounds reasonable. If
true,
it means that productivity gain will occur somewhere between 7 and 14
days.
Before 7 days, the new habits are so new that the employee must take
extra
time to remember to do them. After 14 days, the habits are almost in
place,
and the efficiencies of the new methods overcome the employee's extra
efforts
to remember. From then on, it's all gravy.
Sports analogies are ubiquitous. One I like
is the
speedskating analogy. In 1980 every competitive speedskater was on the
old
skates with the four wheels forming a rectangle. The few people on
inline
skates were recreational skaters and were ignored. By 1983 there were a
few
fast inline skaters, but none rose to the level of regional
competition.
By 1986 inline skaters participated in races and a few actually won.
This
caused others to try them, and those trying them invariably skated
slower,
not faster. Some gave up and went back to rectangular skates. But
others
continued practicing on inlines, eventually beating their best rectangular skate times. By 1992, every competitive speedskater used inlines -- nobody
on
rectangular skates could win or even come close.
But there was a breakin period of several years.
Today it's absolutely clear that inline skates are much faster than the
oldstyle
skates. And yet, every oldstyle skater went slower the first few times
they
tried inlines. |
What I'm trying to say is this. You must support your employees in
their quest
to use the better technique. This means that for 1 to 3 weeks after
training,
do not demand improved productivity, and do not make a big deal out of
lessened
productivity. After 3 weeks, you have the rest of the employee's tenure
to
profit from their training.
Decide on the process
Choose the troubleshooting process best suited toward your goal, and
best
optimized for your purposes. Different goals and purposes were
discussed in
the Decide on the goals section of
this
article.
Decide on the course and instructor
Some troubleshooting processes are taught by a variety of vendors,
while others
are taught by just one. If the course you want is taught by a variety
of
vendors, pick the best one based on reputation, price and "fit".
Some vendors give you the choice of teaching by the vendor's
instructor(s),
or licensing the course for teaching by your own in-house instructors.
There
are pros and cons of each. Some vendors also give the option of a
"train the
trainer" course so that in-house trainers fully understand what they're
teaching
and how to teach it.
In house instructors
In-house instructors are great because they have better knowledge of
the attendees'
work lives. That means they can use examples and exercises more suited
to
the audience's day to day work, thus gaining credibility. Some vendors,
such
as Troubleshooters.Com, provide the inhouse trainers with self
explanatory
instructor materials. Beyond that, in a large training project, it's
often
valuable to have the vendor provide "train the trainer" training to the
trainers.
Depending on the vendor, use of in house instructors can be very
economical.
For instance, Troubleshooters.Com charges only $45.00 per attendee to
license
the Universal Troubleshooting Process course given by in house
instructors.
Contract instructors
With all the advantages of in house instructors, why would anyone
choose
the vendor's instructors?
There's
no substitute for the vendor's instructors if you want the utmost in
troubleshooting process knowledge. Also, the vendor's instructors
truly believe in the power of their troubleshooting process, so there's
no need to get "buy in" from the trainers themselves. Last but not
least, vendor supplied instructors are knowledgeable enough to conduct
"train the trainer" sessions for the in-house instructors.
In a few cases, the organization's politics create a situation where employees don't find the organization's trainers credible, and will much more easily believe an "expert from afar". Or, perhaps, they associate in-house trainers with propaganda. Such cases might justify bringing in the vendor's trainer.
Just remember that, in spite of all their advantages, vendor supplied
instructors have less knowledge of your employees and their work, so
the supplied examples and exercises won't be as specific to your
workplace.
Decide on the attendees
Who gets trained? Obviously, for technical troubleshooting your
technical people would receive the training. The courseware vendor can
tell you the optimal class size, so you can decide how many to send,
and if you should conduct more than one class.
Some employees are more accommodating to new ideas than others. When
all other things are equal, send those most likely to buy in to the new
process. When their productivity is improved, use that fact to petition
both upper management and other employees to obtain troubleshooting
process training.
Letters to the Editor
All letters become the property of the publisher (Steve Litt), and
may
be edited for clarity or brevity. We especially welcome additions,
clarifications,
corrections or flames from vendors whose products have been reviewed in
this
magazine. We reserve the right to not publish letters we deem in
bad taste
(bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be
sure
the subject reads "Letter to the Editor". We regret that we cannot
return
your letter, so please make a copy of it for future reference.
How to Submit an Article
We anticipate two to five articles per issue, with issues coming out
monthly.
We look for articles that pertain to the Troubleshooting Process, or
articles
on tools, equipment or systems with a Troubleshooting slant. This can
be
done as an essay, with humor, with a case study, or some other literary
device.
A Troubleshooting poem would be nice. Submissions may mention a
specific product,
but must be useful without the purchase of that product. Content must
greatly
overpower advertising. Submissions should be between 250 and 2000 words
long.
By submitting content, you give Troubleshooters.Com the
non-exclusive,
perpetual right to publish it on Troubleshooters.Com or any A3B3website. Other than that, youretain
the copyright and sole right to sell or give it away elsewhere.
Troubleshooters.Com
will acknowledge you as the author and, if you request, will display
your
copyright notice and/or a "reprinted by permission of author" notice.
Obviously,
you must be the copyright holder and must be legally able to grant us
this
perpetual right. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for
clarity
or brevity. Any published article will include a two sentence
description
of the author, a hypertext link to his or her email, and a phone number
if
desired. Upon request, we will include a hypertext link, at the end of
the
magazine issue, to the author's website, providing that website meets
the
Troubleshooters.Com criteria for links
and
that the author's website first links to Troubleshooters.Com. Authors:
please
understand we can't place hyperlinks inside articles. If we did, only
the
first article would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with
subject
line Article Submission. The first paragraph of your message should
read
as follows (unless other arrangements are previously made in writing):
I (your name), am submitting this article for possible publication
in
Troubleshooters.Com. I understand that by submitting this article I am
giving
the publisher, Steve Litt, perpetual license to publish this article on
Troubleshooters.Com
or any other A3B3 website. Other than the preceding sentence, I
understand
that I retain the copyright and full, complete and exclusive right to
sell
or give away this article. I acknowledge that Steve Litt reserves the
right
to edit my submission for clarity or brevity. I certify that I wrote
this
submission and no part of it is owned by, written by or copyrighted by
others.
After that paragraph, write the title, text of the article, and a two
sentence
description of the author.
URLs Mentioned in this Issue
- Microsoft Licensing
- http://gartner11.gartnerweb.com/public/static/home/today/il0731003.html:
You cannot reimage a computer with an OEM Windows installation. You
need
to pay Microsoft for the privelege.
- http://www.computerworld.com/cwi/story/0,1199,NAV47_STO60416,00.html:
"Microsoft retools corporate software licensing program".
- http://www.computerworld.com/cwi/story/0,1199,NAV47_STO60163,00.html:
"Microsoft asks PC builders to help stem 'naked' system order". This
article
documents Microsoft's rewards for system builders to turn in
corporations
ordering PC's without operating systems. This article also says that
Windows
XP and Office XP will have a "forced registration system".
- http://www.computerworld.com/cwi/story/0,1199,NAV47_STO60695,00.html:
"Microsoft licensing shift creates uncertainty for user". Microsoft
users
must do expensive audits.
- http://www.computerworld.com/cwi/story/0,1199,NAV63_STO59167,00.html:
"Microsoft Pitches XP to Corporate Users": More on Windows XP.
- http://www.computerworld.com/cwi/story/0,1199,NAV47_STO60173,00.html:
"User queries prompt new Microsoft attack on open source": Describes
how
Microsoft's Craig Mundie says that the open-source movement could
result
in "product instability" and "inherent security risks" for software
users.
- Windows to Linux Conversion Stories
- Miscellaneous URL's