Troubleshooters.Com Presents

July 1999 Troubleshooting Professional Magazine: Troubleshooting CGI

Copyright (C) 1999 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.

<--CGI Review   |  Contents   |   CGI Troubleshooting On Someone Else's Server-->

Troubleshooting Review

By Steve Litt
If you're a regular reader of Troubleshooting Professional and Troubleshooters.Com, this article is old hat. If you've discovered Troubleshooters.Com recently, this material is essential to understand the rest of the July magazine.

Troubleshooting Defined

Troubleshooters.Com defines Troubleshooting a little differently from the dictionary. Here's Troubleshooters.Com's definition:
 
Troubleshooting is the act of restoring a sub-performing system back to its as-designed state.

Note the implications. We are not talking about fixing a marriage or business relationship, nor solving hunger or war. The system must have a known as-designed state. Such a system is called a well defined system. This isn't totally limited to man made systems. For instance, to the extent that the as-designed state of a human is known, we can troubleshoot people (i.e. give medical care).

The primary tool to accomplish this is the process of elimination. Bottleneck analysis is also sometimes used, as are various intermittent busting tactics.

A second definition endorsed by Troubleshooters.Com is:
 
Troubleshooting is the act of improving the performance of a system beyond its as-designed state.

This is commonly called "souping up" the system or "improving performance" or whatever. Tools used in this type of Troubleshooting are bottleneck analysis, profiling, etc. This type of troubleshooting is important in the design of improved products. Note that it's still necessary to start with a well defined system, and that the final, improved system is likewise a well defined system.

Troubleshooting Ability Defined

It's likely there's no such thing as troubleshooting ability. It's more constructive to speak of, and measure, troubleshooting effectiveness or troubleshooting results. My 20 years studying Troubleshooters of varying effectiveness testify that any innate property of the person doing the troubleshooting takes a minor role at best. There's probably a correllation between intelligence and troubleshooting effectiveness, but the intelligence factor is almost completely masked by the simple factor of how well the person follows a valid Troubleshooting Process.

Nor is system expertise the key. A certain expertise on the system under repair is necessary, but not nearly sufficient. Combine that with the fact that a troubleshooter following a valid Troubleshooting Process usually can gain sufficient system expertise in hours or days, or he can team up with a person with the necessary system expertise. System expertise is a necessary, but small and easily acquired, piece of the puzzle.

It's the process! Most technologists with less than 3 years work experience have no troubleshooting process, making their performance abyssmal. Most technologists with more than 5 years work experience have developed their own troubleshooting process, which makes them effective to the extent that they're really aware of what they're doing and to the extent that their troubleshooting process is complete.

The 10 step Universal Troubleshooting Process is a valid Troubleshooting Process optimized for a wide variety of troubleshooting situations. It is thoroughly documented on Troubleshooters.Com. I teach courses and license course material for those needing further clarification. The bottom line is that the Universal Troubleshooting Process can be learned in a week, and depending on the process initially used by the Troubleshooter, it can improve troubleshooting effectiveness up to tenfold.

The remaining challenge is ego. Take 100 Troubleshooters and ask them whether their troubleshooting effectiveness is poor, fair, OK, good or great, and the answers will skew toward good and great. Ask them the same questions about the typical Troubleshooter and the results will skew toward poor or fair. Obviously many Troubleshooters delude themselves.

In fact, my observation has been that, especially among males, "Troubleshooting Ability" is a major component of self image, and has been since the first time they popped the hood of their parents' car and removed the air cleaner. For such people, seeking or accepting Troubleshooting training can be equivalent to admitting a basic personal flaw. Sadly, such a reluctance to seek means of improvement only makes things worse. Luckily, there are Troubleshooting oriented websites where one can anomonously seek better Troubleshooting strategies and tactics.

In summary, troubleshooting effectiveness primarily correllates to adhearance to a valid Troubleshooting Process, possibly with a minor correlation to personal traits such as intelligence. So for all practical purposes, there's no such thing as "troubleshooting ability".

The Four Troubleshooting Tools

Every good Troubleshooter uses four mental tools:
  1. Mental Model
  2. Divide and Conquer
  3. The Attitude
  4. Fix the Right Problem
The Mental Model is knowledge of the system under repair, usually visualized as a block diagram to reveal interactions between components, and thus potential test points. It is the only one of these four tools dependent on the system under repair. It's noteworthy that most "Troubleshooting" courses spend 60-90% of their time on Mental Model, which is why they're so ineffective.

Divide and Conquer is the act of narrowing the scope of the root cause. The closer this narrowing approaches binary search (ruling out half the remaining scope with each test), the faster Troubleshooting will proceed. However, the desire for perfect binary search must be moderated by factors such as ease, likelihood, and safety.

The Attitude is simply the frame of mind necessary for effective troubleshooting. Like any other endeavor, the individual approaching it with an attitude of panic, anger or overconfidence will fail, regardless of his other "abilities". The Attitude is a huge portion of the Universal Troubleshooting Process training, beyond the scope of this little article. However, this one mantra, phrased as a question, repeated throughout the course of a tough repair, yields huge improvements:
 
How can I narrow it down just one more time?

Fix the Right Problem is the quality control Troubleshooting Tool, as well as the goal directed part. The Troubleshooter must continuously verify that he or she is correcting the symptom that needs correcting, by finding the ultimate root cause, and the testing phase (step 8 in the Universal Troubleshooting Process) must prove that.

The Ten Step Universal Troubleshooting Process

The Universal Troubleshooting Process is documented extensively on the Troubleshooters.Com website, so we'll just list the steps here:
  1. Get the Attitude
  2. Get a complete and accurate symptom description
  3. Make damage control plan
  4. Reproduce the symptom
  5. Do the appropriate general maintenance
  6. Narrow it down to the root cause
  7. Repair or replace the defective component
  8. Test
  9. Take pride in your solution
  10. Prevent future occurrence of this problem

The Certainty of Solution

When a reproducible problem, on a well defined system having a reasonable amount of readable test points, is investigated using a valid Troubleshooting Process, solution is a mathematical certainty. This is explained elsewhere on Troubleshooters.Com. Please remember the first sentence of this paragraph, as it will come back to haunt us in CGI troubleshooting.

How Important is Troubleshooting?

I often imagine Rodney Dangerfield developing his famous "I don't get no respect" line during a stint as a tech support person. Why is it that the people standing between car companies and the "lemon" designation are called "grease monkeys"? Why are tech support people, who form a major chunk of a software vendor's goodwill, paid less than $10.00/hr.

It's far beyond the scope of this article to fight that windmill, so I'll simply state two reasons why every technologist must be an effective Troubleshooter: 1) Troubleshooting is a basic part of the design process, and 2) Troubleshooting is a basic part of the Rapid Learning process. The ineffective Troubleshooter will arrive late to market with designs, and will also find his knowledge on the trailing edge of technology. Neither situation bodes well for prosperity.

Steve Litt can be reached at Steve Litt's email address.