Troubleshooting Professional Magazine
The Revolution Continues
Copyright (C) 1998 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.
[ Troubleshooters.Com | Back Issues ]
But don't declare victory just yet. There's still the matter of increasing complexity discussed in the January 1998 issue. And the fact that the third era of Troubleshooting is drawing rapidly to a close...
We Troubleshooting Process Patriots have come a long way, haven't we? Doesn't it seem like yesterday that hit and miss troubleshooting ruled the land? Remember that horrible phrase, "troubleshooting what?". And remember the dreaded blank stare when you said "anything".
How different it is today! Today it's "We need you to train our people in system independent Troubleshooting. Please submit a proposal". It's in the budget of all major corporations, and today it's considered obvious.
We've achieved our strategic objectives. We've won. But back in the old days, we were so busy achieving process, we postponed dealing with the geometric increase in complexity. Now that we've cleared the process bottleneck, the next step is complexity. We can make systems more modular, and we can automate Troubleshooting Process. Both are being done.
We've crested the hill, and we've seen it's not the last. We're ragged and dusty. But we're not tired. We're stronger, more committed, and more knowledgable than ever. Because we know productive Troubleshooting is nothing less than our destiny.
But too much complexity is overkill. The best, but by no means the only, example is computers. Sure, they're easier to use than ever. But many crash hourly. Considering that 1980's style minicomputers ran more than a week between crashes, and that's not good news.
And how often does that $1500.00 family computer require hours of expensive consultant time to debug. Fact is, the modern Windows operating system and many of its applications are so non-modular and entangled that binary search troubleshooting is often impossible. While we talk of computers becoming commodities, the reality is that Joe Average must either accept a computer that performs "most of the time", or regularly use a technological wizard with an MCSE after his name and hourly rates to match. And isn't it common to see those wizards leave, scratching their heads in puzzlement? Or blaming another vendor?
So I issue this challenge to Microsoft: Build your products to the same standards as the car manufacturers. Nobody buys a car knowing it has have several defects. Build your software modular, and publish the test points. If you can't insert features without non-modularity, don't insert the features. Get rid of the intermittents. Several years ago it was possible to sell buggy operating systems, but back then you were the only OS in town. Now there's Linux.
And I issue this challenge to computer purchasers: Consider Linux. Linux is fairly modular, well documented, and ultra reliable. Up-times often exceed a month. It may not be ready for prime time on your desktop (although many are using it that way), but it certainly makes an ultra-reliable server. Web, Naming services Email, File server, print server, Internet dialout server, Database server. I'm not telling you you have to use Linux. But do yourself a favor and at least consider it.
During the past year Troubleshooting Professional Magazine has become a favorite of the Linux crowd. They're sick of jumping through hoops to make advertising hype come true, and have turned to Linux as a reliable alternative. Linux use grows geometrically as complexity for its own sake is seen hurting the bottom line.
The point is this. Times have changed. The Internet provides alternative to Madison Avenue hype. The truth can no longer be bought or monopolized. In a world of free information, the marketplace will finally favor quality.
And it's an easy trap to fall into. If you're anything like me, it's tempting to celebrate victory and kick back. And maybe even resist change a little. But Troubleshooting waits for no man. We need to march right on into the fourth era of Troubleshooting...
|From invention of the bow and arrow until the invention of the steam engine (8000 BC to 1700's)
|Observation only. Systems under repair have all components visible, so the problem is obvious. Little diagnosis needed. On the other hand, repair/replacement of component requires precision, one of a kind work.
|From invention of the steam engine until the 1970's
|Observation and non-rigorous diagnostic process. Systems under repair still contain only a few components, though some aren't visible to the naked eye. Diagnosis required, but doesn't need to be rigorous. Replacement parts likely to be available from a vendor, but may be difficult to replace.
|From 1970's until the present
|Observation and rigorous diagnostic process. Systems under repair contain many (>10,000) components, most abstract or invisible to the naked eye. Non-rigorous diagnosis produces circular search and rework. Rigorous diagnosis required. Replacement parts available from a vendor, and due to modularity often easy to install. Software components are often replaced in five minutes with a few keystrokes.
|Technologically Enhanced Troubleshooting
|From now until the next era
|Observation and rigorous diagnostic process, aided by context-relevant technology-served information (Troubleshooting process aware smart manuals). Systems under repair are now hugely complex, not always completely modular. Observation and rigorous diagnostic process alone takes too long, because no human can have the complete Mental Model, manual and diagnostic information in his or her head. Replacement parts are stock.
1999 is the autumn of the third era of Troubleshooting. Sure, we can still solve problems with Troubleshooting Process alone. But only those folks with immense memory capacities and expensive (and continuous) training possess the necessary Mental Model. So if your computer network fails, you call in a $200/hour guy with a dozen initials after his name. He's a strongman muscling around a HUGE load of system specific information.
But carrying that information load is manual labor. It makes no more
sense than using strongmen to dig a trench, instead of using a backhoe.
Don't get me wrong. There will be occasional problems where the smart manuals
won't help, and the strongman is needed. Just like there are some excavation
tasks (digging near cable) where we must revert to hand shovels. But profitable
outfits will use the Troubleshooting Process aware smart manuals for the
bulk of the work.
And nothing's changed! Most expert systems still are marketed as solutions instead of tools, and still are marketed as a replacement for people skilled in Troubleshooting Process. Most expert systems are experts on the system and ignorant of Troubleshooting Process. We've been down that road before -- it's a budget busting dead end.
But there's now an expert system receiving the Troubleshooters.Com seal of approval. Read on...
To accomplish this, an inter-corporational team led by General Motors' Jim Roach created a Troubleshooting Process optimized for use in a smart manual. This process takes the equivalent of the Universal Troubleshooting Process Step 5 (General Maintenance), and elevates it to an artform, complete with pre-defined diagnostics, error code interpretation, and factory mods. This allows discovery of the root cause at Step 5 80% of the time, leaving only 20% to go to time-consuming Step 6 (narrow it down). Even when problems do go to Step 6, they arrive in a much narrower scope than they otherwise would have.
They then built a voice-activated machine to implement their Troubleshooting Process. The machine is much more than a thyroidal Step 5. It also covers Step 2 (symptom description), Step 3 (damage control plan), Step 4 (symptom reproduction) and Step 7 (repair or replace). This little machine contains the total knowledge of the system's engineers, delivered in a when-needed, whatever-needed manner. It's like having the design team and engineering standing right there while you Troubleshoot.
I've seen them market it, and they do it the right way. It's marketed as a tool, not a solution. There's no implication that you can fire your techs, hire clerks, and get a good result. Indeed, they tell everyone who will listen about the underlying Troubleshooting Process, and its importance. They market it as what it is -- a Troubleshooting Process aware tool to relieve the Troubleshooter from the manual labor of carrying thousands of facts in his head.
It's a first of breed product, so naturally it's not perfect. Add to this the fact that fewer than 1000 people today really understand Era 4, so it's difficult (and expensive) to find authors for the material.
The machine itself is still expensive, so don't expect to see it at your local computer store this year. It calls for an early decision by the technician as to which subsystem contains the flaw, thereby posing a risk, in the hands of an inexperienced technician, of the problem getting out of the box. And in an industry like software, with its present non-modularity and "push maintenance down to the user" mentality, it becomes a much greater challenge. But it's a quantum leap above anything that preceeded it, and it's getting better all the time.
I've used it. It's nothing short of phenominal.
But today irrepariablity has risen to dizzying heights. Witness the typical desktop computer system, where crashes, bugs, and non-functional features are accepted as normal. Check out Windows 98's "troubleshooters" (contained in the help system) -- a group of pre-defined diagnostics falling significantly short of an Era 4 Troubleshooting product.
Microsoft's "troubleshooters" authors aren't at fault. The problem is the complexity, non-modularity, and shear number of variables in the Windows operatins system. Maybe one in ten-thousand people can draw a detailed block-diagram of the Windows operating system, and such a diagram would be entangled almost beyond recognition. So how can a smart manual be made?
An Era 4 product is not a replacement for good, clean design. Instead, good, clean design is a prerequisite for an adequate Era 4 product. This is the responsibility of the manufacturer.
I wrote my first pre-defined diagnostic a few weeks ago. It's an HTML based diagnostic to Troubleshoot network problems on networks with a Linux server and Microsoft clients. It works like a charm. I'm still trying to decide whether to sell it or put it up as free content on Troubleshooters.Com. I anticipate many more pre-defined diagnostics in the near future.
Troubleshooters.Com guest author Marc-Henri Poget (Generating Web Decision Graphs using Perl, November 1998 TPM) and I have discussed creation of an Era 4 tool for the software industry. It's a really tough job due to the lax standards in our industry. We indeed live in exciting times.
They say he who ignores history is bound to repeat it. The unemployment lines are filled with Era 2 troubleshooters.
But already the new challenge of complexity looms. Even while we were
kicking the intuitive guys out of the palace, complexity was rendering
our methods (by themselves) impractical. And already, complexity has spawned
some interesting responses.
In short, times have never been better for Troubleshooters.
And man, there was junk. Turntables that pushed the record UP the spindle! One-of-a-kind audio-cassette carosels. Car audio requiring complete disassembly to replace a stretched drive belt. And a certain, highly rated, tapedeck that may have sounded great, but with springs, pulleys, levers and notches making re-assembly a 4 hour job. There was no shortage of juryrigged, convoluted Rube Goldberg machines if that's the kind of thing you liked to work on.
"Hey cherry picker, can't you do the hard ones?"
I regularly violated policy, but they didn't have the heart to fire me. I made them 100% more money than their average technician.
And man, there was junk. A set of foundation classes that mapped to the operating system instead of the problem domain. An operating system whose normal behavior was to crash several times a day. Black boxes called DLLs connected to multiple things, with no documentation, and different versions. Scripting languages with more exceptions than rules. There was no shortage of juryrigged, convoluted Rube Goldberg machines if that's the kind of thing you liked to work on.
"Hey idealist, can't you do Microsoft?
I regularly violated policy, but they didn't have the heart to fire me. My apps came in on time and under budget, worked flawlessly except under user error, in which case they were easy to troubleshoot. They satisfied both the users and the corporate strategy.
And man, that technology was good. Worked exactly as expected. Full source availability guaranteed we'd never get boxed in by a vendor and provided fallback documentation. Even huge multi-subnet Linux systems with naming services, email service, file/print/application server and database web apps could be readily troubleshot, on those rare occasions when they stopped working. The operating system was documented, modular, and made sense. A base of over a million "idealist" technologists, all helping each other online, guaranteed no problem was insoluble. Superior versions of apps and technologies formerly thought possible only in Windows became available in Linux.
Hey genius, how do perform these miracles?
I regularly raised my rates, but they didn't have the heart to fire me. My apps came in on time and under budget, worked flawlessly except under user error, in which case they were easy to troubleshoot. They satisfied both the users and the corporate strategy.
The more things change, the more they stay the same.
March 1998 Troubleshooting Professional Magazine: Bottleneck Analysis.
So, in the absense of a large voter turnout, I'll once again pick what
I believe to be the year's five best articles:
|What It's About
|The Man Who Banned General
|A Troubleshooting short story encompassing three generations, over 30 years, plots, intrigue, corporate takeover, and sweet victory.
|A serious discussion of this too-often ignored subject.
|GNU: An Idea Ahead of its Time
|The true story of how one man, Richard Stallman, changed history with an idea and an ideal.
|A Supercomputer in
|Souped up Dodge Darts and Linux Parallel Supercomputers combine to create the ultimate Boomer fantasy.
Database Enabled Web App
|Pure geekiness puts this Linux Web App howto in the top five.
A warm thank you for a tough job well done goes out to this year's guest authors:
All submissions become the property of the publisher (Steve Litt), unless other arrangements are previously made in writing. We do not currently pay for articles. Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):