Troubleshooting Professional Magazine
Troubleshooting the Untroubleshootable |
Steve Litt is the author of the Universal Troubleshooting
Process Courseware, He is also the author of Troubleshooting
Techniques of the Successful
Technologist, |
[ Troubleshooters.Com | Back Issues | Linux Productivity Magazine ]
|
STEVE:
WHY DID YOU BOTHER?
About now you're probably asking yourself why I spent ten hours just making a Mental Model of Nullmailer. Here is the logical derivation of why I spent that ten hours:
|
Creating a Mental Model of a black box system takes a lot of time and effort. Upon completion of your Mental Model, you'll know many things that can't possibly explicitly spelled out on the Mental Model. That doesn't stop you from troubleshooting, because all the material you learned is fresh in your mind, and your Mental Model serves as a reminder.
But what if six months from now you need to troubleshoot that same system? Would all that knowledge still be at your fingertips? Or would you need to do a lot of refreshing? And what if somebody else needed to do the troubleshooting? Would your Mental Model diagram be everything necessary for them to understand the system? Almost certainly not.
So, if you anticipate anybody ever having to troubleshoot this system again, it's time well spent to add a narrative to your diagram. To see how I added a narrative to my Nullmailer Mental Model diagram, see http://www.troubleshooters.com/linux/nullmailer/landmines.htm#_nullmailer_mental_model.
Your Mental Model shows which interaction states rule out what, but they don't necessarily give you the tools to measure at an interaction, or to set a state or inject a signal at that interaction. Often you need tools to do those things. The good news is, now that you have a Mental Model, you can envison what these tools should look like, and either procure them or make them.
For each line on your Mental Model, ask yourself the following two questions:
For instance, to look at the availability of a socket at a port, you can use nmap with suitable arguments. To both change the port's state and read from it, you can use telnet to the proper port. While asking these questions and thinking about tools, use your imagination. Typically, there will be all sorts of innovative ways to do these things. The more experience you get with tools, the more innovative you'll find yourself getting.
The best tools are often computer programs, either easy scripts, or if you can make them, actual computer programs, subroutines or objects. For systems where one program spawns (execs or forks) another program, a handy tool that comes up over and over again is a stub program. A stub program reveals the arguments passed to it, reads a number from the keyboard, and returns that number as an exit code. If the spawned program runs without a keyboard, you'll need to comment out the read from keyboard and hard code the exit code. Occasionally you'll need a stub program that prints not only its command line args, but also the environment variables exported to it, or even text that comes in through STDIN. The following code is the shellscript I used to obtain command line args, environment variables and passed in STDIN:
#!/bin/bash |
Note that the ability to type in the return code was commented out because this program couldn't accept keyboard input due to the fact that text was redirected into it. What I did with this program was, in effect, replace Nullmailer's smtp executable with this shellscript, to see exactly what the smtp executable is working with.
Here are some categories of software oriented tools:
Barriers to view are often physical. That's when things like coathangers, mirrors, lights, and their more sophisticated commercial brethren save the day. Who hasn't marveled at fiber mounted micro-cameras to see inside of walls or other inaccessible places, or infraread detectors to yield a temperature readout of a point, without touching that point?
Sometimes you need to see inside electronics. Oscilloscopes, network testers, and even simple digital multimeters give you a clear view.
To summarize, look at your Mental Model, and ask yourself these two questions:
Answer that question by making or procuring software tools, electronic tools, or tools to peer into or get into physically tight spaces.
Used right, predefined diagnostic (often called "scripts" at tech support organizations) are the greatest brainpower savers ever invented. Used wrong, they're the greatest source of frustration ever forced upon a user. Scripts get a bad name because so many corner cutting organizations use them incorrectly: Specifically, these cheapskate organizations try (and usually fail spectacularly) to use them to swap clerks for technologists, and then, in response to their frequent failure to identify the root cause, escalate the user yet again, so the user yet again must be placed on hold, and the user must yet again tell his entire symptom description to another clerk (but this time a manager-clerk) unlikely to have either the tech-chops or the troubleshooting mindset necessary to solve the problem.
Forget all that, because you are going to use predefined diagnostics the right way: As a brainpower save to get you as far as you can get, as simply as you can get there, before going offroad and using the Universal Troubleshooting Process to get all the way to the root cause.
In my experience, the best predefined diagnostics to use for complex systems are diagnostic ladders, which rule things out big chunks sequentially. Diagnostic ladders are easier to design and easier to use than typical flowchart type predefined diagnostics. They usually start with the easiest tests, or sometimes they start with the most fundimental components. They often start by ruling out components needed by, but not a part of, the system under repair. For instance, whether you're repairing a problem with Samba, NFS, web access, SSH, socket based software, or a firewall, you usually do ping commands pretty early in the troubleshoot, because if you have no network connectivity, nothing else matters, and anything else would be a waste of time. Additionally, ping is quick, cheap and easy. You can often do ten ping commands in five minutes, ruling out 90% of the system in that five minutes. That's time well spent.
Of all the diagnostic ladders, the best one I ever saw was the Samba project's old standby, DIAGNOSIS.txt, which is currently available as an HTML file in your Samba docs directory. Using DIAGNOSIS.txt, a Samba ignorant technology can get 90% of the way to the solution in a matter of minutes, without breaking a mental sweat. That's repairability at its best! Better yet, it's fairly easy to automate DIAGNOSIS.txt into a shellscript that writes a log file, but that's beyond the scope of this document.
Here's the diagnostic ladder I created to troubleshoot Nullmailer. Samba's diagnostic ladder makes the difficult easy. Because Nullmailer is much more black-boxy than Samba, my Nullmailer diagnostic ladder merely makes the impossible possible, and could cut hours off your time to repair on a Nullmailer problem.
Diagnostic ladders are always great, but when you're troubleshooting something you haven't touched in six months, they're priceless. But not as priceless as being able to hand over everything but the last mile to the user, so that your first conversation with the user after his describing the symptom is his diagnostic ladder results.
Nullmailer's code is well written. Variable names are well chosen, indentation is revealing, files and methods are short. The code is consistent and data-centric. There are few comments, but it's good code that's fairly readable, at least at the subroutine level. And yet, unless you're lucky enough to find an exact error message in the source code using grep, finding where to put diagnostic code will be a major undertaking.
The reason is that, like so many OOP programs before it, Nullmailer is written in Volleyball Code. Objects bounce around, acquiring this property here, that property there, to the point that finding where something gets set is an excruciating trace. And all too often, in its journey from object to object to object, a data piece loses its revealing name and becomes something non-obvious. The two really bad programs I ever wrote -- version 2 of a timesheet collector I wrote for a client (my first ever OOP program), and version 1 of my free software UMENU EMDL parser, were both volleyball code, and with both, six months later I was scared to death to touch them. In both cases I rewrote them. In my opinion, in five or ten years, when OOP for OOP's sake has finally been discredited and OOP has become a tool rather than a way of life, Volleyball Code will rank right up there with Spaghetti Code as "bad stuff programmers used to do".
As an example, with Nullmailer, I wanted to find a place in the the
source for the smtp
executable where I could intercept the password, open a file by that
password's name, and get the real password from that file, thus avoiding
the "password on ps ax" security problem. The trouble was, after an hour
of trying, I couldn't find an intersection between cli_options or options
and
argv. That intersection would
have shown where to insert my (hack) conversion from filename to password.
There was just too much bouncing around between objects, methods and files
to quickly determine that.
I'm sure there are people who could give you tips on making sense of Volleyball Code (probably people who write Volleyball Code for a living), but my advice is limited. If the developer incorporated comments, use them. If the developer used informative variable names, use them. You might try to make a diagram of how all the objects in the program interact, although for a true Volleyball Code program, this diagram can end up being more confusing than the as-built flowchart of a Fortran program quick-coded by a 1972 Spaghetti Specialist.
Ultimately, your best ally in figuring out volleyball code is running the program in a debugger. You can view the contents of data pieces, and when they change. You can force a change in any data piece, and view the results, to figure out if you're even on the right track. It will take hours, but that's better than days, and after some experimentation time in the debugger, you might start to understand and even appreciate the code.
If, for some reason, you can't run the program in a debugger, then you're knocked back to repeated diagnostic prints, data item forces, and maybe your own home-brew logging facility, complete with recompilation and runs. Hours and hours and hours of brain-draining frustration. All I can say is, make a shellscript for the edit/make/run loop to save time and brainpower, and do your best.
Don't feel too badly. I'm probably going to need to use these techniques in order to know where to insert my modification to Nullmailer in order that my email password doesn't get passed to the smtp executable, visible to anyone who can do a ps command or look in the /proc tree.
When you write code, be sure to remember all the hassle you've encountered with the Volleyball Code of others, and be nice to those who need to look at or modify your code. When you code OOP, may I suggest that you view an object as either an entity or a gathering of data, both with attached functions (methods)? May I suggest that if an entity or data group doesn't immediately hit you as "this should be a class", you go procedural with it? May I suggest that if you find yourself seeking an excuse or justification for making something a class, you do it procedurally? May I suggest that if a class represents a data group, you set as many of its properties as possible in the constructor, remaining ones in one or two well named methods, called early in the program, and every later use of the object is just a read? May I suggest that if any class elements are files, whether handles, streams, programs to spawn, or any other kind of file, their corresponding filenames be included to facilitate complete error messages?
If, during program design, it seems like a good idea to make a class implementing an object whose lot in life is to bounce between other objects, collecting and delivering data, may I implore you to think twice about that design, and if you go ahead with it, document it well, name the object well, and please, for self-documentation's sake, pass it as an argument, don't make it a global variable?
And obviously, when you code, whether OOP or not, round up the usual suspects of good code. Use comments where comments are needed. Name your variables descriptively and precisely so they add to the documentation. So instead of using two parallel variables for slightly different facets of the same thing, as shown below:
long_args[MAXARGS];
short_args[MAXARGS];
It's better and more self-documenting to group like things probably iterated over with the same subscript variable, as follows:
typedef struct {
char *long_args[MAXARGS];
char *short_args[MAXARGS];
} ARGS;
ARGS args;
So in the preceding, the third long arg would be args.long_args[2].
Or, perhaps this is more to your liking:
typedef struct {
char *longg;
char *shortt;
} ARGS[MAXARGS];
ARGS args;
In the preceding, the third long arg would be args[2].longg. If I were to do the preceding, personally I'd make it even more self-documenting, like this:
typedef struct {
char *longg;
char *shortt;
} ARGPAIR;
typedef ARGPAIR ARGS[MAXARGS];
ARGS args;
With the preceding you can conveniently do this:
ARGPAIR mypair = args[2];
Here's something else: How about arranging things so any time a stream is in scope, so are its filename and its usage (read or write). That way it's trivial to make every error message name the file it failed to open for input or output. If Nullmailer had done this, it would have been three times easier to debug. The following is a tiny example program, implementing this idea, for academic simplicity using assert for some of the error handling, and assuming mode can be only "r" or "w".
#include <stdio.h> #include <string.h> #include <stdlib.h> #include <assert.h> typedef struct{ const char * path; const char * mode; FILE * f; } FILEINFO; FILEINFO * MakeFileinfo(const char * path, const char * mode){ FILEINFO * fi = malloc(sizeof(FILEINFO)); assert(fi != NULL); fi->path = path; fi->mode = mode; fi->f = fopen(fi->path, fi->mode); if(!fi->f){ fprintf(stderr, "Could not open file >%s< for %s, aborting.\n", fi->path, (fi->mode[0] == 'r' ? (char *)"read" : (char *)"write")); return NULL; } return(fi); } void ZapFileinfo(FILEINFO *fi){ fclose(fi->f); free(fi); } #define MAXLEN 10000 int main(){ char * line; char * buf = malloc((MAXLEN + 1) * sizeof(char)); assert(buf != NULL); FILEINFO *fi_in = MakeFileinfo("/etc/fstab", "r"); assert(fi_in != NULL); while((line = fgets(buf, MAXLEN, fi_in->f)) != NULL){ buf[MAXLEN] = '\0'; printf("%s", line); } ZapFileinfo(fi_in); FILEINFO *fi_out = MakeFileinfo("/root/junk.jnk", "w"); assert(fi_out != NULL); return 0; } |
Obviously the preceding was crude because its objective was to demonstrate the concept of packaging the filename with the stream, but I think it makes the point. Reading fi_in.f isn't much harder than reading, let's say, infname. And any time you pass file operations as an argument to a function, pass the whole of fi_in so that the function has access to the filename and mode. fi_in could also be a member of a struct, or in C++, a class.
And if this had been coded in C++ instead of C, so much the better, because MakeFileinfo() and ZapFileinfo() would have been incorporated into class FILEINFO, now called Make() and Zap().
On another topic, I'm a big fan of "has-a" relationships: Nesting of classes, structs or whatever. Instead of having all sorts of little objects bouncing from hand to hand, I'm much more in favor of having a grand container grouping all data for a given purpose, starting with config info. As necessary, the grand container can contain (has-a) sub-containers, which themselves can contain sub-containers...
One more thing: If your program's source code has a tendency to "bounce around", for gosh sakes, document it. Why are you organizing and interacting your classes the way you do? Put it in English, hopefully with a diagram. Think of it as bragging. In Nullmailer's case, the source code isn't haphazard. Nullmailer's only real design flaw is failure to ennunciate filenames in error messages, and failure to implent a usage() function accessable by the --help option. Yes, Nullmailer's code is tight, consistent and designed. The only problem that the design is difficult (at least for me) to figure out. So why not spend a half a day documenting it and drawing a diagram?
Let me ask you a question. Is the guy who does only OOP any less of a one trick pony than the guy who never does OOP? I contend that the guy using OOP when the problem at hand suggests another paradigm is the one most likely to write Volleyball Code.
Some problem domains suggest or even demand OOP. Ever since the mid 1990's, we've all known the way you make a GUI window is to use the required Window class as a ancestor (inheritance -- is-a) and put your own code into it. Entities like a person or a family all but beg to be objects. In my opinion, you'd need to be nuts not to have the program's configuration data in an object, probably containing several sub-objects (has-a).
But many problem domains cry out for procedural programming. Perhaps you need to perform six consecutive processes on something. You just cascade those six processes (functions) in your code and you're done. Oh yeah, sure, you could consider them six actors each doing his process, or you could consider it one actor grabbing a new input and then performing each of six methods on it, but those uses of OOP are sooooo contrived. Consider a file conversion, maybe with a merge. This is basically procedural code, perhaps throwing in an object to retain totals and perform break logic, and if complex formatting is required, perhaps an object to do the formatting. But the high level algorithm is the same as it's been since the dawn of time:
And don't think you can necessarily use an object to store the whole thing in RAM -- the input file might be a hundred gigabytes. Programming the top level logic of a file conversion using OOP just complexifies what would have been simple. And very likely results in Volleyball Code.
In my opinion, there's nothing wrong with mixing objects and procedural code. Doing so is often the most readable.
In the 1990's, OOP was the holy grail: the answer to reuse, readability, simplicity, fast programming, and probably a cure for Montazuma's Revenge too. We all spoke glowingly of Smalltalk, and every program sported an object whose sole purpose in life was to serve as a main routine. Languages like Java even enforced such a run-the-whole-thing object. OOP was the new thing, the best thing, and the slightest deviation from OOP was blasphemy.
Now, in the 10's, it's not so simple anymore, is it. Other programming paradigms take mindshare. Functional programming (as old as Lisp), lambda calculus, callback routines, and lots more. Even that 1970's-80's staple, procedural coding, is enjoying an uptick in credibility. In 2013, no-OOP C is still the most used language (http://www.hackdigital.com/top-used-computer-languages-in-it-industry/), which, thanks to structs and callbacks, can do a heck of a lot of OOP like stuff. When you need OOP like stuff, that is.
Bottom line: OOP is good when it's the best way to do it. Volleyball Code is always bad.
Any article submitted to Troubleshooting Professional Magazine must be licensed with the Open Publication License, which you can view at http://opencontent.org/openpub/. At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Troubleshooting Professional Magazine, you must decline the option to prohibit commercial use, because Troubleshooting Professional Magazine is a commercial publication.
Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.
Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):
Copyright (c) 2001 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version Draft v1.0, 8 June 1999 (Available at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version is presently available at http://www.opencontent.org/openpub/).
Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.
After that paragraph, write the title, text of the article, and a two sentence description of the author.