Troubleshooters.Com Presents

Troubleshooting Professional Magazine

Volume 7 Issue 3, Summer 2003
Speculation, Guesswork and Prayer
Copyright (C) 2001-2005 by Steve Litt. All rights reserved. This material was started in 2001, and completed in 2005.

Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

[ Troubleshooters.Com | Back Issues ]


Arrogance rides triumphantly through the gates, barely glancing at the old woman about to cut the rope and spring shut the trap. -- Mason Cooley

CONTENTS

Editors Desk

By Steve Litt
Riddle: The answer is "speculation, guesswork and prayer". What's the question?

The question is: What is troubleshooting without process?

Absent  valid troubleshooting process, troubleshooting becomes speculation in the hands of experts, guesswork for those of average abilities, and prayer in the hands of neophytes.

Speculation, guesswork and prayer. Loss of profit, morale, and reputation (for the company, department, and employee).

This issue of Troubleshooting Professional discusses the importance of Troubleshooting Process, and the speculation, guesswork and prayer resulting from its absence. We'll discuss how lack of process creates speculation, guesswork and prayer, how to recognize the problem, the resulting hardships to your department or organization, and how you can eliminate the problem with training in troubleshooting process.

So kick back, relax, and read this magazine. And remember, if you're a Troubleshooter or Technologist, this is your magazine. Enjoy!

Steve Litt is the creator of the Universal Troubleshooting Process Course. He can be reached at Steve Litt's email address.

Yes, as a Matter of Fact This Magazine is Late

By Steve Litt
The Summer 2003 issue of Troubleshooting Professional Magazine was completed on October 6, 2005. It had been on the web before that, presumably since the summer of 2003, but it had been incomplete.

Stranger still, this magazine was written primarily in 2001, but then put aside for other content. When I rediscovered this content on 10/6/2005, I was very impressed with it, finished it, and put the revisions on the web. I hope you enjoy it, and just remember this:
Better late than never!

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist. He can be reached at Steve Litt's email address.

Speculation

By Steve Litt
Give a symptom description to 12 experts in a room. Ideally you'd like to see immediate, unanimous agreement on the root cause of the symptom. But of course that's impossible.

A very realistic expectation is to hear each expert say "I don't know but I'll find out". But that's all too rare.

What you usually find when giving a symptom description to 12 experts is 12 different snap judgements. Speculation. This poses problem because at the most one of those 12 snap decisions is correct. The other 11 embark on an expensive journey to a dead end. The result for your department or organization is slow solutions, failure to repair, or other problems.

Somehow society, and experts themselves, have developed an expectation that a true expert can hear a symptom description and instantly fathom the cause. To see how truly silly this is, imagine if court procedings were conducted with this expectation.

Effective Troubleshooting is Like a Courtroom

Who done it? What caused it? Same thing.

In court you need to prove the cause beyond a reasonable doubt. Until that proof is concluded, any remediation is likely to do more harm than good. No matter how expert the detective, the lawyer, the expert witness, it remains the duty of the jury to weigh all evidence.

In the courtroom, failure to prove before remediating allows the guilty to go free, and the innocent to be imprisoned or executed. It results in loss of confidence by the public. To see examples, Google search the combination "death row" "Anthony Porter".

In the repair of machines, computers, networks and other systems, failure to prove before remediating facilitates repeated failure to repair, further damage to the system, and loss of confidence, in both the department and organization, by customers, management and co-workers.

Both Require a Process

To minimize likelihood of harmful actions, both courtroom activities and troubleshooting require a process.

In the case of the courtroom, the chain of evidence must be preserved. Both sides present witnesses, examine and cross examine the witnesses, present exhibits, have witnesses testify about exhibits, make and counter objections. The judge determines at each point whether valid courtroom process is being followed. The jury is informed they must decide based ONLY on evidence that has been properly and correctly presented in court. The jury reaches its best decision, based ONLY on the evidence.

In the case of troubleshooting, the technologist aquires the symptom description, reproduces the symptom, checks likely suspects, and performs tests. Each test designed to rule out sections of the system. Eventually the area of inquiry narrows to the point where the root cause is obvious, at which time the technologist has reached his best decision based only on the evidence, and repairs or replaces the defective component.

But What About Intuition

We all know troubleshooters use intuition, and we all believe this is a good thing. The question is, at which point is intuition brought to bear? Intuition is a wonderful tool in deciding what tests to perform. It's a major component in guiding the course of the diagnosis. But when the techologist assumes a cause based on intuition, and acts on that assumption, he's crossed the line.

The process leading to the solution should be more deductive than intuitive. Unless the technologist performs verification tests before performing an intuition-based fix, he will waste an extraordinary amount of time and money implementing a wrong fix. If the technologist fails to test, the fix will be wrong, resulting in serious problems.

Not only is intuition harmful at the root cause level, but excessive use can also be harmful at the diagnostic level. If the technologist repeatedly speculates specific components, he loses the benefit of ruling out large chunks of the system.

The decision of which troubleshooting tests to perform next is based on a quadruple tradeoff:

  1. Ease of performing the test
  2. Likelihood that the test will isolate the root cause into a small area
  3. Even divisions in ruling out the remainder of the system
  4. Safety concerns
Intuition and speculation are proper to the extent that they are used to estimate the preceding four factors and draw the best educated guess. Beyond that, intuition and speculation have no part in troubleshooting.

Talent is No Excuse for Speculation

The argument might be made that the speculation of an expert is more valid than that of a non-expert. Such an argument might state that whereas the juror is just a "peer" of the defendent, the network engineer is an expert having spent a career accumulating network knowledge. Certainly the network engineer's speculation can be respected.

But in fact, the network engineer's speculation is no more respectable than a juror's would be. The most talented and knowledgeable network engineer lacks Xray vision and divining rod fingers. He cannot listen to a symptom description, gaze upon the system, and devine the single defective components out of a cast of hundreds or thousands.

Yet many talented people try for the speculative quick fix. They use a diagnostic machine or software program, speculate that the diagnostic's recommendations are true, forsaking all other areas of the machine. Or they hear a symptom description identical to one they've heard before, and speculate that the root cause is the same. Or they analyse the system, and speculate on which part would produce an identical symptom.

Once again, there's nothing wrong with using speculation as a tool to decide among several likely diagnostic tests, but when one assumes a root cause, the result is replacement of the wrong component, circular troubleshooting, argumentative troubleshooting, finger pointing, profit loss, and harm to the reputation of all involved.

Speculation is a Product of Ego

Certainly a technologist considering himself a beginner or "average" would have no problem "admitting" the need of diagnostic tests before drawing a conclusion. It seems like those considering themselves "experts" are the ones expecting themselves to provide an instant answer. Yet a little reflection tells us that no sane person would expect any human to conjur a root cause out of a system with thousands of components. The sane person knows that only the process of elimination leads to a quick and correct identification of the root cause.

So why do those considering themselves experts fall into the speculation trap? They fail to recognize that troubleshooting consists of two distinct strengths:

  1. Knowledge of the system under scrutiny
  2. Knowledge of troubleshooting process
The reason they don't know is because they've never been trained in #2. Very few people have. As long as we expect our experts to get along without training in valid troubleshooting process, we'll continue to experience speculation and the problems it brings.

Some Experts Use Valid Troubleshooting Process

Nothing in this article is meant to imply that all experts are devoid in troubleshooting process knowledge. Many experts have learned a valid troubleshooting process either through courses or throught the school of hard knocks. Such expert troubleshooters usually do not speculate nor exhibit arrogance.

Problems Caused by Speculative Troubleshooting

The problems caused by speculative troubleshooting can be grouped into three categories:
  1. Individual problems
  2. Team problems
  3. Corporate problems.

Individual Problems

The individual technologist troubleshooting speculatively experiences productivity loss caused by the circular troubleshooting forced upon him by his speculation. This results in delayed solutions at best, and incorrect solutions if speculations are not verified with conclusive tests. Such productivity loss cannot help but torpedo the technologist's morale, which could result in burnout, stress, or blame shifting. Burnout, stress and blame shifting invariably steal precious energy from the technologist's diagnostic efforts, creating a vicious circle. The final result could be disability leave, dismissal, or gravely impaired performance.

Team Problems

No troubleshooter works alone. There is usually a customer, internal or external, who expects the problem fixed quickly and cleanly. In the case of a programmer debugging his own code, there will be users relying on that code, and management who have gone out on a limb promising clean code delivered on time. Often the troubleshooter is part of a team consisting of a members from one or more departments.

Speculative troubleshooting ruins teamwork. It facilitates argument and finger pointing between employees, or with customers or vendors. The dispute can take the form of "who owns the problem", as in the classic "it's hardware/no it's software" gambit, or it can take a more personal form as in "Joe gave me bad information (his erroneous speculation), and cost me many hours of time scheduled for another project.

Combined with a lack of team awareness (good Troubleshooting Process courses teach team troubleshooting), speculation results in alternating technologist visits. The hardware guy comes out, pronounces the problem software, and leaves. The software guy comes out, declares it to be hardware, and leaves. This goes on three or four times, consuming a week or more, until heavy-hitter managers demand that both the hardware and software people show up at the same time and stay there til the problem is solved.

All of this creates morale problems for the team. Team morale problems tend to degenerate very fast and very hard, with grave results.

Corporate Problems

Morale problems also crop up in the corporation. Speculative troubleshooters leave users and customers high and dry. Name calling contests often ensue. All of this reduces the organization's productivity.

The decreased individual and team performance results in increased technologist salary expense, as more technologists are required to do the work.

The finger pointing between technologists, departments, and companies, caused by speculative troubleshooting can make the organization look like a Keystone Kops film.

Opportunities are lost as lingering unsolved problems erode client confidence. Those lingering problems also create risk of customer loss, litigation, morale problems, and harm to the organization's reputation.

All these "screwups" result in remediation costs. Drawn out meetings and phone calls are required to sooth nerves (usually followed by yet another mistake). Profit-sapping customer discounts are often used to say "I'm sorry". In the case of catastrophic mistakes, remedial advertising might be required.

Recognizing Troubleshooting by Speculation

Sometimes the hardest part of dealing with a problem is recognizing it in the first place. How do you recognize speculative troubleshooting. To  best recognize speculative troubleshooting, be on the lookout for the following symptoms:

Arrogance Based Diagnosis

Arrogance based diagnosis leaves clues. When you see these clues, investigate.

"I'm the expert, I know how to fix it!"

Statements like the preceding should raise a red flag. Although it's true that every troubleshooter must occasionally trot out this phrase to calm an overly agitated user, if someone uses this phrase regularly it's probably more out of arrogance than an attempt to calm. All too often statements like the preceding are followed by misdiagnosis.

If the person making the statement has anything but a steller record of fast, accurate solutions, intervention may be called for.

"The diagnostic software says..."

Diagnostic tools and software are essential for diagnostic productivity. Unfortunately, the more complex and featureful they are, the more they can be misused. Many diagnostic tools attempt to troubleshoot down to the defective component on the basis of a symptom description, possibly plus an inadequate array of pre-defined diagnostic tests. Many so-called experts are more expert in the operation of their diagnostic tools than on the system under repair or the process of troubleshooting. All too often, experts place complete belief in the pronouncements of such diagnostic tools, leading to replacement of incorrect components and failure to repair.

When a technologist says "The diagnostic software says...", if the next sentence is "So lets start looking at that subsystem", everything's OK. But if the next sentence is "So let's replace the...", expect trouble. It's the height of arrogance to believe ones self so expert at diagnostic equipment that further investigation is unnecesary.

Again, if the person making the statement has anything but a steller record of fast, accurate solutions, intervention may be called for.

Foot dragging

Maddeningly, some technologists consider themselves so important that they simply fail to respond. Sometimes it's the result of overzealous scheduling of their resources by management. But in many cases, it's arrogance, plain and simple.

Although foot dragging is not caused by speculation, often foot dragging and speculation have the same cause -- arrogance.

Agenda Based Diagnosis

Sometimes speculation takes the form of conflict of interest. Maybe the technologist doesn't want to be bothered right now, so he manufactures a believable explanation why the root cause is in a subsystem whose responsible party is a different person or organization. The old "It's a hardware problem" gambit.

Often the diagnosis is made in order to shift blame. Perhaps the techologist wired the building with inferior network cabling, and now is trying to blame the poor performance on "bad routers" and "packet collisions" rather than admit he made a bad mistake.

Sometimes there's a more sinister intent. Two departments get in a spitting contest, and technologists from each department fabricates root cause speculations designed to place the cause in the enemy department.

All agenda based diagnosis share's one common trait: Root causes don't move according to the whims of people. Eventually the scoundrels are found out, and people are angered. If the duped party is a customer, he might take his business elsewhere. If the duped party is inside the organization, the entire organization has been hurt by the slow diagnosis -- a fact not unnoticed by upper management.

Good troubleshooting process training educates technologists to the fact that sooner or later they will lose, and lose big, if they diagnose according to agenda.

Rote Repetitive Diagnosis

Some experts are more memory experts than system experts or troubleshooting process experts. An outstanding memory is an asset when used wisely and a liability when used as a crutch. Once again, if memory serves to identify the diagnostic test most likely to isolate the root cause, that's a good thing. But if memory is used to speculate the root cause, and that speculation is used as the basis of a repair, many times it's a costly mistake.

We see it all the time. Last time the Windows crash was caused by a problem with the database server, and you just got a blue screen, so we'll reload the database server now. Unfortunately, this time it's a rogue .dll file placed on the application server by a restore, so reloading the database server just dumps all work past the last backup and costs many hours.

There are some who believe expertise is achieved when one has seen many problems and remembers the solutions. Such people don't understand the role of troubleshooting process, and doom themselves to a high percentage of bogus repairs. Such people need training in the troubleshooting process.

Non-cooperation

Non-cooperation is really an effect of diagnosis by arrogance or diagnosis by agenda. If you see much non-cooperation during troubleshooting, investigate further.

Slow or Unsatisfactory Solutions or Dissatisfaction with the Support Department

The ultimate result of speculative troubleshooting is slow or unsatisfactory solutions. If you see these symptoms, investigate further.

Eliminating Troubleshooting by Speculation

Obviously, to eliminate speculative troubleshooting, you need to fix the cause. As mentioned previously, possible causes include:

Arrogance

One could define arrogance as unjustified certainty. The arrogant technolgist is certain his speculation is correct and there's no need for further research or testing. In the absense of some sort of extrasensory powers, how could he draw that conclusion?

It could be abject stupidity, but if he's smart enough to be considered an expert, that's unlikely. What's more likely is that he is not familiar with the role of process in troubleshooting. This is likely because only a tiny percentage of technologists have been trained in troubleshooting process, so the troubleshooting productivity of this "expert" might be on a par with his contemporaries. Quite often, lack of troubleshooting process training is the root cause of arrogance, which is a cause of speculative troubleshooting. Therefore, troubleshooting process training can correct the situation.

Rote repetitive diagnosis

What causes rote repetitive diagnosis? Certainly its practitioners know it fails frequently, so why do they use it overwhelmingly? Could it be they know of no alternative? Once again, given the lack of training in the troubleshooting process, it's quite likely.

So in cases of speculation caused by arrogance or rote repetitive diagnosis, training in a valid and correctly optimized troubleshooting process will likely cure the speculation, and the host of problems it creates.

Agenda

Diagnosis by agenda is not so simple. Here there's pressure on the technologist to toss the problem to the other department or company like a hot potato. Here the speculation is merely a symptom, with the root cause being a defective business model pitting departments against each other, or occasionally the company against its customers. Does that mean training in troubleshooting process cannot help?

Yes and no. The process training won't eliminate the root cause of agenda based diagnosis, but it can expose the root cause. Once one department understands the process of diagnosis, they are prepared to find all root causes within their area of responsibility, and they're prepared to document cases where the other department hot potato's them, then armed with such evidence, that department can report their findings to management. It's possible that management will understand and fix the underlying business problem. If nothing else, the troubleshooting process training makes the day to day work environment more livable.

Summary

If speculative troubleshooting creates problems in your organization, there's a high likelihood that problem can be cured or at least ameliorated by training your technologists in diagnostic process.
Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist. He can be reached at Steve Litt's email address.

Guesswork

By Steve Litt
What's the distinction between speculation and guesswork? Speculation involves certainty, while guesswork does not.

Those defined as "experts" are often self-defined. They're certain of their expertise. Those without such certainty might be relegated to the "average" designation.

Wouldn't a self-perceived "average" troubleshooter forgo speculation (or its less certain cousin called guesswork) in favor of a valid troubleshooting process? He would if only he knew a valid troubleshooting process.

Once again, only a tiny segment of the technology population receives training in the process of diagnosis. The rest have to learn it in "the school of hard knocks". Some learn it better than others. It's not an either/or proposition, but instead a spectrum from the process clueless to those totally at home with and aware of process, cause and effect, the process of elimination, and prioritization of diagnostic testing.

We all guess from time to time when deciding what diagnostic test to take next. But all too often guessing transends the role of tactical guide, and becomes a strategy in and of itself. That's the time to give the employee an alternative -- training in a valid troubleshooting process.

Those with less than total comfort and awareness of troubleshooting process often resort to guesswork. Sometimes the guesswork takes the form of analyzing the machine and guessing which root causes would cause the symptom. Sometimes the guesswork is along the lines of probability. Guesswork can assume the role of total belief of "expert systems", or of diagnosis by serial replacement, or excessive reliance on escalation. The process-deficient technologist often guesses that the root cause is outside his job description, or that he has insufficient skills to fix the problem (that's almost always a false assumption). And sometimes guesswork is just random guessing. Whatever the appearance of the guesswork, it's always a productivity thief.

If these symptoms appear among your technologists, it's necessary to recognize that diagnosis by guesswork is occurring, and it's time for your technologists to receive training in a valid troubleshooting process.

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist. He can be reached at Steve Litt's email address.

Prayer

By Steve Litt
While the experts are busy with speculation, and average technologists slowly guess their way to a solution, the beginning technolgist must pray for a miracle. Like the others, he's received no troubleshooting process training, but in addition, he hasn't had sufficient time in "the school of hard knocks" to learn even a little process. Making things worse, the beginning technologist is often undertrained in the system under repair.

Barring a miracle or massive intervention by a more experienced technologist, the beginning technologist stands little chance of solving a complex technology problem. And given today's "lean and mean" business environment, there's little hope of much help from senior technologists. So the beginning technologist muddles through his first year or two trying not to alienate coworkers, bosses and customers. He tries not to cause damage to systems he troubleshoots. He fights job disappointment.

In the kinder and gentler days of decades gone by, the organization had resources to help the new technologist through these difficult times. Now many organizations expect the new technologist to sink or swim. The associated turnover is a problem, but they just don't have enough spare capacity when it comes to more experienced technologists.
 

It doesn't have to be this way

There's an alternative to the sink or swim philosophy. New technologists may be several months deficient in their systems training, but a two day course can make them fully competant in troubleshooting process.

As any master technician can tell you, an excellent grasp of troubleshooting process compensates quite nicely for a partial lack of systems expertise. In fact, the cause and effect deduced during the troubleshooting process is one way we learn technological systems.

Ultimately, the organization that brings new technologists up to speed fast experiences productivity gains, reduced staffing, retention and employee acquisition costs, and a better team environment. Two days of troubleshooting process training is a small price to pay for these advantages.

Steve Litt is the creator of the Universal Troubleshooting Process Course. He can be reached at Steve Litt's email address.

Salvation

By Steve Litt
The bad news is that speculation, guesswork and prayer carry huge costs for the organization, the department, and the individual employee. The good news is that because they're caused by inadequate troubleshooting process knowledge, they're cured by a simple training course. Of course, you need to choose the right troubleshooting process, the right course, and the right trainer(s).

The following are the properties of a well chosen troubleshooting process and training:

This article also discusses the finer points of evaluating a troubleshooting process and implementing a troubleshooting process training program.

Valid Troubleshooting Process

A valid troubleshooting process must, at the very least, accomplish the following:

Recognize and Exploit Cause and Effect

Cause and effect is the opposite of cause and effect is superstition. Nobody wants diagnosis by tarot card.

But it's not enough not to be superstitious. Cause and effect must be recognized and exploited. Many so-called "troubleshooting courses" are nothing but yet another journey into the technical details of the system to be serviced, with a few diagnostic tools thrown in. No cause and effect there.

Recognize and Exploit the Process of Elimination

If you count up all the electronic parts, jumper settings, BIOS settings, operating system configuration parameters, and application configuration parameters in a modern computer system, you'll find it has 50,000 or so components that can act as root causes. Combine 10, 100 or 1000 such computers into a network, and the complexity is mind boggling. Nothing short of efficient use of the process of elimination will foster quick solutions of such systems.

And yet there are all sorts of "Windows Troubleshooting" and "Computer Troubleshooting" courses that act as little more than symptom/solution listings, with a few tests and diagnostic products thrown in.

Accommodate the Work Habits of Human Beings

Human beings have certain attributes. To the extent that a troubleshooting process works with those attributes, it will be successful. To the extent that a troubleshooting process fights those attributes, it will fail.

First and foremost, humans can concentrate on about 7 facts at a time. Some more, some less, but 7 is a good number. Right off the bat, that means that the process better not ask the person to try to simulate the machine in an effort to "figure out" what's wrong. Nobody can contemplate hundreds or thousands of components at once. That's why mental simulation troubleshooting doesn't work. That's why processes based on binary search through diagnostic tests work marvelously.

Humans must trust before they can invest. A simple process whose value is obvious will excite a person to learn, and to use after learning. On the other hand, nobody will go to the effort to learn a process requiring all sorts of detailed actions on his part. Initially the trainee doesn't believe it will work, and as such doesn't invest much learning energy.

That brings up the subject of the program of the month. Anyone in the workforce more than a couple years has encountered at least one. When an employee perceives something as a program of the month, he goes through the motions but his mind is elsewhere. His attitude: Been there, done that!

The troubleshooting process MUST NOT be perceived as a program of the month. What are some attributes of programs of the month?

So your troubleshooting process of choice should state its case in plain English, it should take care not to insult the intelligence of those being trained, and like any other company endeavor, it must have the support of management at the highest level of its application and influence.

Properly Optimized Troubleshooting Process

Problem solving process isn't a "one size fits all" proposition. There are generic problem solving methodologies optimized to solve problems in fuzzily defined systems (business, political and relationship), there are troubleshooting processes optimized to solve problems in well defined systems (computers, networks, software and machines of all sorts). There are methodologies specialized for safety critical situations, and for events and extremely sparse intermittents. There are methodologies optimized to find bottlenecks (where the symptom description contains "too" or "insufficient"). Selecting a process not optimized for your situation will cut productivity by an order of magnitude.

The Universal Troubleshooting Process is an example of a troubleshooting process optimized for well defined systems, and works quite well under a wide variety of such systems. Its ideal use is on reproducible or frequently recurring intermittent problems in machines and computer systems (including software) in low, moderate and sometimes highly safety critical situations. In highly safety critical situations (nuclear power plants and the like) it's best supplemented with a more safety optimized process such as Root Cause Analysis.

Because it incorporates Bottleneck Analysis, the Universal Troubleshooting Process quickly solves problems of degree in well defined systems.

Inappropriate Uses of the Universal Troubleshooting Process

The Universal Troubleshooting Process is inappropriate for solving problems in fuzzily defined systems such as businesses, relationships, and factory floors (the work and parts flow). For such problems you'd use a generic problem solving method like Kepner Tregoe, or as appropriate a bottleneck analysis optimized process such as the Theory of Constraints. The Theory of Constraints has become quite famous on the factory floor.

The Universal Troubleshooting Process is insufficient in environments where a large number of events or sparse intermittents occur. Such environments require use of an event-optimized process such as Root Cause Analysis. As mentioned previously, in extreme safety critical environments such as nuclear power plants the Universal Troubleshooting Process should be supplemented with something like Root Cause Analysis.

And likewise...

There will come a time (for many of you it's already come many times) when someone will try to sell you on the idea of generic problem solving training for your technologists, technicians and maintenance people in order to improve their productivity fixing computerized systems and machines. Such use of generic problem solving process is inappropriate, time consuming, error prone, and costly because generic problem solving processes are not optimized for well defined problems, and also because they generally waste time with questions irrelevant to fixing technical problems, such as how to transition from the current state to the desired state: In technical troubleshooting you just repair or replace the bad component. Using a generic problem solving method to solve technological problems can double, triple, or perhaps increase tenfold the time, effort and cost of repair. And your employees will rebel against such use of generic problem solving methods.

Good for expert, average and beginning

Many experts speculate, many average technologists guess, and many beginners pray. The solution must be a troubleshooting process useful to expert, average and beginner alike. It must work, it must be comprehendible, and it must be believeable. It must not wrap itself in buzzword fluff or fuzzy concepts.

Yields significant productivity gains

Speculation, guesswork and prayer waste time and money. The whole purpose of their elimination is increased productivity. The chosen troubleshooting process must truly increase productivity. Fortunately, this is easy, because usually the missing ingredient to productivity is troubleshooting process.

Evaluating a Troubleshooting Process

How do you evaluate a troubleshooting process? How do you estimate its productivity potential? Here's a list of ideas:

Optimized for speed and accuracy

In a high school electronics class years ago, I was taught to scope the input, then the first stage, then the second, stage by stage until the output. The first place I found a problem indicated the stage containing the root cause. That's called serial search, -- a genuine troubleshooting process. It might have even been sufficient for the five tube radios of the era.

Today's systems often have six figures worth of components. Serial search would take weeks or months per system. Serial search isn't optimized for complex systems.

An optimal process for today's systems would use binary search. The drawing to the right shows how binary search works. Each diagnostic test rules out half of the remaining root cause scope. Obviously, real world diagnostic tests can't exactly split the remaining root cause scope, but it's something we must shoot for.

Imagine a system with 16 million components. If each diagnostic test took only one minute, serial search requires 16 million minutes -- that's 30 years. Now let's say you use pure binary search. You would need only 24 diagnostic tests to find the root cause. Even if each diagnostic took an hour instead of a minute, that's three 8 hour workdays.
           
Binary Search

Binary Search


One factor that hugely speeds diagnosis by even division is the availability of numerous cheap and quick testpoints. With such testpoints, a division is only a measurement away. Without them, much mental simulation is required. The Universal Troubleshooting Process is optimized to take advantage of the numerous cheap and easy testpoints on most technology. On the other hand, the methods described in Kepner and Tregoe's "New Rational Manager" do not optimize to this advantage, resulting in a slower methodology on most technology.

No extraneous time consuming frills

For technological troubleshooting, you want a troubleshooting process optimized to technology, with no time consuming frills. Specifically, methods to solve fuzzily defined problems (business and interpersonal) are unneeded extras in technology. If a technologist needs to solve business and interpersonal problems as well as technology problems, he or she should be trained in the best of breed for each, not a methodology that happens to include both.

Minimal jargon

Jargon is distracting and aggravating. Employees are smart, so they know when a course author has used jargon words to obfuscate a beautifully simple concept. They typically attribute the motivation for such obfuscation to an attempt to make the material more complex than it need be, in order to inflate the material's price. Technological troubleshooting is simple enough to be described in plain English. When creating the Universal Troubleshooting Process courseware, I used plain English for every concept except the term Mental Model, which is explainable in 2 minutes.

Respect for Troubleshooter's time and intelligence

All too many employees are subjected to various programs supposed to "get them on board", or "improve their potential". Unfortunately, most are perceived as "bullfeathers". To be credible, troubleshooting process training must stick to the facts, and avoid any hint of "program of the month" type material.

Immediately usable on the job

Timing is everything. Unless the employee can immediately use the training on the job, the material is forgotten within a few days. Employees should be encouraged to use the information immediately upon return to work, and that encouragement should be based on the fact that the material is useful to their everyday jobs.

Implementing a TP training program

Decide on the goals

What is the basic problem? Is it failure of groups of employess to solve problems in a timely and accurate way? If so, are the problems they're solving business problems, factory floor problems, or technological problems (machine and system repair)?

If your employees need help solving business problems, consider a course employing the methods described in "The New Rational Manager" by Kepner and Tregoe. That methodology is great for analyzing and deciding upon solutions for fuzzily defined systems such as businesses, departments, and the like. However, because it doesn't take advantage of numerous cheap and safe testpoints afforded by machines and systems, it's slow and cumbersome for solving technological problems.

If your employees need help solving technological problems, be it fixing machines on the factory floor, local or wide area networks, computer hardware, computer software, or electronics, consider the Universal Troubleshooting Process course. Its optimization for abundant testpoints makes it a lightning quick way to diagnose and fix tech problems.

But resist the temptation to use the Universal Troubleshooting Process for business and interpersonal problems -- lack of testpoints in those types of systems make the Universal Troubleshooting Process insufficient for business and interpersonal problems.

When dealing with safety critical problems, you need to solve the business problem behind the technological problem. For instance, in a nuclear reactor, if lack of a written maintenance policy for air conditioners causes an air conditioner to run out of freon, which causes 130 degree temperatures in the parts storage room, which causes weak solder joints on stored circuit boards, which causes those boards to fail early, which causes a rod mechanism failure, which causes the reactor to trip, root cause must be traced all the way back to the lack of written maintenance policy. In such a case, Root Cause Analysis as described by Max Ammerman's book would be what is needed.

In many cases your people need instruction on more than one of these methodologies. In that case, I suggest you give them courses in all necessary methodologies. Resist the temptation to make one of the technologies fit all the types of problems.

Commit the resources

Resources are always an open question. Here are some ideas.

Instruction costs

Instruction costs vary widely. One of the least expensive is the Universal Troubleshooting Process, which, if implemented by in-house instructors, costs $45.00 per attendee plus what you pay the in-house instructors. If you choose to have a Troubleshooters.Com instructor teach the course, it will typically cost between $2800.00 and $7000.00 for a 2 day course, depending on location. This can be very cost effective if you have more than 10 attendees at the course.

Many other courses cost considerably more. When deciding on a course, take into account the cost, the benefit, and how applicable the training is to the problems you're trying to solve. Reserve the money from the budget.

Employee downtime

The employees can't work while they're receiving instruction, and they can't receive instruction while they're working. As with any course, you must commit enough resources so employees can take the course. This might involve having employees take the course in shifts. However you work it, have it worked out in advance.

Post-course break in period

Commit the resources for a post-course break in period. In other words, for a period of a week or so, expect employee productivity to go down, not up.

Why's that?

Because your employees have just changed their habits. What they once did by rote, they now need to think about. That takes time. I've heard it said that it takes 21 days to change a habit. That sounds reasonable. If true, it means that productivity gain will occur somewhere between 7 and 14 days. Before 7 days, the new habits are so new that the employee must take extra time to remember to do them. After 14 days, the habits are almost in place, and the efficiencies of the new methods overcome the employee's extra efforts to remember. From then on, it's all gravy.

Sports analogies are ubiquitous. One I like is the speedskating analogy. In 1980 every competitive speedskater was on the old skates with the four wheels forming a rectangle. The few people on inline skates were recreational skaters and were ignored. By 1983 there were a few fast inline skaters, but none rose to the level of regional competition.

By 1986 inline skaters participated in races and a few actually won. This caused others to try them, and those trying them invariably skated slower, not faster. Some gave up and went back to rectangular skates. But others continued practicing on inlines, eventually beating their best rectangular skate times. By 1992, every competitive speedskater used inlines -- nobody on rectangular skates could win or even come close.

But there was a breakin period of several years.

Today it's absolutely clear that inline skates are much faster than the oldstyle skates. And yet, every oldstyle skater went slower the first few times they tried inlines.

What I'm trying to say is this. You must support your employees in their quest to use the better technique. This means that for 1 to 3 weeks after training, do not demand improved productivity, and do not make a big deal out of lessened productivity. After 3 weeks, you have the rest of the employee's tenure to profit from their training.

Decide on the process

Choose the troubleshooting process best suited toward your goal, and best optimized for your purposes. Different goals and purposes were discussed in the Decide on the goals section of this article.

Decide on the course and instructor

Some troubleshooting processes are taught by a variety of vendors, while others are taught by just one. If the course you want is taught by a variety of vendors, pick the best one based on reputation, price and "fit".

Some vendors give you the choice of teaching by the vendor's instructor(s), or licensing the course for teaching by your own in-house instructors. There are pros and cons of each. Some vendors also give the option of a "train the trainer" course so that in-house trainers fully understand what they're teaching and how to teach it.

In house instructors

In-house instructors are great because they have better knowledge of the attendees' work lives. That means they can use examples and exercises more suited to the audience's day to day work, thus gaining credibility. Some vendors, such as Troubleshooters.Com, provide the inhouse trainers with self explanatory instructor materials. Beyond that, in a large training project, it's often valuable to have the vendor provide "train the trainer" training to the trainers.

Depending on the vendor, use of in house instructors can be very economical. For instance, Troubleshooters.Com charges only $45.00 per attendee to license the Universal Troubleshooting Process course given by in house instructors.

Contract instructors

With all the advantages of in house instructors, why would anyone choose the vendor's instructors?

There's no substitute for the vendor's instructors if you want the utmost in troubleshooting process knowledge. Also, the vendor's instructors truly believe in the power of their troubleshooting process, so there's no need to get "buy in" from the trainers themselves. Last but not least, vendor supplied instructors are knowledgeable enough to conduct "train the trainer" sessions for the in-house instructors.

In a few cases, the organization's politics create a situation where employees don't find the organization's trainers credible, and will much more easily believe an "expert from afar". Or, perhaps, they associate in-house trainers with propaganda. Such cases might justify bringing in the vendor's trainer.

Just remember that, in spite of all their advantages, vendor supplied instructors have less knowledge of your employees and their work, so the supplied examples and exercises won't be as specific to your workplace.

Decide on the attendees

Who gets trained? Obviously, for technical troubleshooting your technical people would receive the training. The courseware vendor can tell you the optimal class size, so you can decide how many to send, and if you should conduct more than one class.

Some employees are more accommodating to new ideas than others. When all other things are equal, send those most likely to buy in to the new process. When their productivity is improved, use that fact to petition both upper management and other employees to obtain troubleshooting process training.
Steve Litt is the creator of the Universal Troubleshooting Process Course FAQ. He can be reached at Steve Litt's email address.

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Troubleshooting Process, or articles on tools, equipment or systems with a Troubleshooting slant. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

By submitting content, you give Troubleshooters.Com the non-exclusive, perpetual right to publish it on Troubleshooters.Com or any A3B3website. Other than that, youretain the copyright and sole right to sell or give it away elsewhere. Troubleshooters.Com will acknowledge you as the author and, if you request, will display your copyright notice and/or a "reprinted by permission of author" notice. Obviously, you must be the copyright holder and must be legally able to grant us this perpetual right. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

I (your name), am submitting this article for possible publication in Troubleshooters.Com. I understand that by submitting this article I am giving the publisher, Steve Litt, perpetual license to publish this article on Troubleshooters.Com or any other A3B3 website. Other than the preceding sentence, I understand that I retain the copyright and full, complete and exclusive right to sell or give away this article. I acknowledge that Steve Litt reserves the right to edit my submission for clarity or brevity. I certify that I wrote this submission and no part of it is owned by, written by or copyrighted by others.
After that paragraph, write the title, text of the article, and a two sentence description of the author.
 

URLs Mentioned in this Issue