Troubleshooters.Com Presents

Troubleshooting Professional Magazine

 
Volume 9 Issue 1, Winter, 2005
A Few Computer Repair Tips
Copyright (C) 2005 by Steve Litt. All rights reserved. Materials from guest authors copyrighted by them and licensed for perpetual use to Troubleshooting Professional Magazine. All rights reserved to the copyright holder, except for items specifically marked otherwise (certain free software source code, GNU/GPL, etc.). All material herein provided "As-Is". User assumes all risk and responsibility for any outcome.


Steve Litt is the author of the Universal Troubleshooting Process Courseware,
which can be presented either by Steve or by your own trainers.

He is also the author of Troubleshooting Techniques of the Successful Technologist,
Rapid Learning: Secret Weapon of the Successful Technologist, and Samba Unleashed.

[ Troubleshooters.Com | Back Issues | Linux Productivity Magazine ]



 
Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1 1/2 tons. -- Popular Mechanics, March 1949

CONTENTS

Editor's Desk

By Steve Litt
Computers. Can't live with em, can't live without em.

One magazine cannot possibly serve as a general purpose repair manual, but in this Troubleshooting Professional Magazine issue I'll offer you a few tips that have saved me time and effort in the past.
This issue of Troubleshooting Professional Magazine is devoted to choosing the right tool for the job. So kick back, relax, and enjoy the read. And remember, if you're a Troubleshooter, this is your magazine.
Steve Litt is the author of "Troubleshooting Techniques of the Successful Technologist".  Steve can be reached at Steve Litt's email address.

Does it Count Memory?

By Steve Litt
The voice on the other end of the line is strident: "My computer's broken. You've gotta fix it right away. My report's due in 2 hours!"

Breathing deeply, you summon The Attitude, and ask the granddaddy of all symptom description questions: "What indicates to you that there's a malfunction?"

"It doesn't work -- aren't you listening! You IT guys are all the same!"

You count to 5. This is nothing personal -- it's just your job. "What do you see on the screen when you turn the computer on?", you ask.

"Nothing! Would you listen!"

"Is your monitor turned on? Is the monitor's power light on?, you ask.

"Are you calling me stupid?" the user screams. "Of course it's on -- otherwise I would have turned it on."


"I'll be right down there", you assure him. You grab a 10 foot video extension cable, video card and a couple RAM sticks and head for the user's desk.

In 2 minutes you completely eliminated software as a cause. Long before the computer boots an operating system, the BIOS counts the memory attached to the system. If you don't see the bios count memory, you know for sure it's a hardware problem.

You boot his computer and reproduce the symptom. You look behind his computer and verify that the monitor cable is correctly plugged into the computer and that the monitor is powered up. Everything's properly connected. Seeing another working computer in the cubicle, you use the 10 foot monitor extension cable to connect the nonworking computer to the other monitor. The other monitor is black. You've now eliminated the monitor from the root cause scope. You reconnect the original monitors.

Now it's time to investigate the computer itself. You open the user's box and swap the video card. No change. You revert to his old video card, and disconnect everything from the motherboard except the video card, the RAM sticks, the wiring to the power button, and the power supply connection to the motherboard.

If it fails to count memory now, the problem is in either the motherboard, the CPU, the RAM, the power switch or the video card, and you already eliminated the video card as the cause. In that case, you could remove the power switch from contention by manually using a screwdriver to short the the switches connectors on the motherboard. It would then be pretty easy to swap out the RAM and if necessary, the CPU, and if nothing changes, it's the motherboard.

NOTE

If you're lucky enough to have a POST card you can eliminate many of those steps and reach a conclusion faster. Most of us don't have POST cards.

However, in this particular case none of that is necessary, because once everything but the video card, the RAM sticks, the wiring to the power button, and the power supply connection to the motherboard were disconnected, it began to count memory. Naturally, because no disks were connected, it halted with a rather ominous sounding message about a missing boot sector, but this is to be expected.

Now you start replacing things, a few at a time, until the symptom recurs. Then you start disconnecting the last wave of connections one at a time until you find the one component that toggles the symptom, which in this case is the internal modem, which isn't even used. You remove the modem, connect everything back up, and the user is up and running, with plenty of time to finish his report. He apologizes to you for his harsh words, explaining that he's had a tough day.

-*-*-

The lowest common denominator in the boot process is the part where it counts memory. If that doesn't happen, you need to get it to happen. Note that some BIOS's have a feature whereby a "splash screen" comes up instead of memory counting and the rest of the POST (Power On Self Test) activities. There should be a way to turn off that splash screen. Personally, I always disable the splash screen -- I don't think a few lines of text offend the user's sensibilities, but if there's a problem, you'll be very glad to see the prompt telling you how get to the BIOS configuration utility, you'll be very glad to see all relevant text.
Steve Litt is the creator of the Universal Troubleshooting Process.  Steve can be reached at Steve Litt's email address.

An Abbrieviated Look at the Boot Process

By Steve Litt
This article presents a very abbreviated look at the boot process for a commodity X86 computer or clone.

The computer has a startup code built into its ROM BIOS chip. Upon startup, the x86 processor jumps to the BIOS code, which typically starts at F0000 hex. That code works directly on the CPU, memory and disks. It absolutely does not have a kernel, operating system, or system libraries through which to access thing.

One of the first things this startup code does is give the troubleshooter a chance to press a key (usually the Del key) in order to change some data parameters stored in the ROM BIOS. You can change which disks it tries to boot from and their order. You can change the perceived disk geometry. Most of the rest of what you can change is beyond the scope of this book.

The startup code then goes into its Power Up Self Test (POST) which does just what it's supposed to -- test everything in the computer. What is tested, and what errors are considered fatal, depends to a great deal on how you have configured the BIOS data parameters (commonly called the BIOS setup or CMOS setup). Typically it first checks the video card, then counts memory, then checks all hard drives for their existence, checks the floppy and CDRom. Note that many monitors take a second or so to "warm up", so you might not see the video card test, but by the time the computer counts memory you should see it.

Depending on the BIOS setup settings, the memory counting might simply be a display of how much RAM is installed, or it might actually check each RAM address, thus "counting up".

If everything checks out OK, the BIOS then tries to boot from a disk. Which disks it tries to boot from, and the order in which they're tried, is defined in the BIOS setup. For troubleshooting purposes, the best sequence is the floppy first, then the CDROM drive, then the first hard disk. This can be set in the BIOS setup screen, available by pressing the Del key (or some other key) during POST.

The BIOS hands off to the first device it finds that is bootable.

Booting from Disk

Did you ever wonder where booting got its name? It stands for "bootstrap", which is a reference to the term "pulling yourself up from the bootstraps". After you read this section you'll understand how aptly that name applies.

When the bios hands off to a device, it specifically hands off to the first sector of that device. Cylinder 0, Head 0, Sector 1. Actually it pulls that sector into memory and starts executing it, but for the purposes of this article you can just think of the BIOS jumping to the code on the first sector of the device.

That first sector is 512 bytes long.  Those bytes are distributed like this:

What
Where
Length
Boot code (bootloader)
0-0001bd
446
Partition table
0001be-0001fd
64
Boot code signature (hard code 55aa)
0001fe-0001ff
2

While we're at it, let's give the program on the MBR a name. It's called a bootloader. Some famous bootloaders from the UNIX world are LILO and grub. Windows has its own bootloader.

So the computer program on the MBR, commonly called the bootloader, must do its magic in 446 bytes. Not Megabytes. Not Kilobytes. Bytes. There is no operating system loaded yet, so there is no access to high level structures like files. Or even medium level structures like clusters (DOS, Windows) or inodes (Linux, Unix, BSD). Any disk access must use the low level routines on the ROM BIOS (int 13H, to be specific), which access the disk by Cylinder, Head and Sector (CHS, or a CHS translation). Ughhh.

About all the the bootloader can do in its 446 bytes is to look at the partition table with which it shares the MBR, and jump to code elsewhere. Depending on the capabilities of the bootloader in question, that somewhere else could be the sector 1, head 0 of the first cylinder of a partition (that partition's "boot sector"), or it could be a "file" that is in a known and static cylinder, head and sector, that cylinder, head and sector being kept as part of the 446 bytes. Wherever it jumps, it's the job of that program to actually load the kernel of the operating system. Sometimes there are two jumps.

Because grub is the bootloader I know the best, let me describe what happens with grub.

The grub program is actually two programs: stage1 and stage2. The first 446 bytes of stage1 is copied up to the first 446 bytes of the MBR, together with the CHS (cylinder, head, sector) address of stage2. Then, when you reboot to the disk containing that MBR, stage1 runs, and passes control to stage2, which is on the disk at the CHS written up to the MBR (heaven help you if somebody moved stage2). Then stage2 runs. stage2 has code built in which can understand some filesystems, including Linux, BSD and Unix. stage2 reads a file called menu.lst and finds the kernel and other things based on that.

Other bootloaders can't read filesystems, which is why addresses of all relevent files must either be included in the MBR or in the map file pointed to by the MBR.

Once the kernel is running, the last step is the operating system boot. Linux runs the init program to boot. Windows has something similar.
Steve Litt is the author of the Universal Troubleshooting Process courseware.   Steve can be reached atSteve Litt's email address.

Using Knoppix as a Diagnostic Tool

By Steve Litt
Knoppix is a Linux distribution on a bootable CD. What makes it really slick are:

When you have a question like "is it hardware or software", you can boot Knoppix to get a completely different operating system. If the problem still occurs, it's probably hardware. The exception is if Knoppix happens not to support the hardware in question. Many hardware manufacturers do not write Linux drivers, so the Linux community writes them, and they are placed on the Knoppix CD. But sometimes a piece of hardware hasn't been out long enough for the Linux community to reverse engineer it, or sometimes the hardware is so esoteric that nobody's reverse engineered it. In that case, if the Windows symptom is "can't recognize the Esoterica IIID video card", the Knoppix symptom could be the same, even though the real problem was a Windows configuration. But in most cases, Knoppix is a good way of answering the question "is it hardware or software?"

Your sound card stopped working. Is it hardware, or software? Assuming that sound card is supported by Knoppix (and that's usually a good assumption), just boot Knoppix and see whether the sound card works in Knoppix. The same can be done with video cards and most periperals.

Have you ever had Windows stop working, but you haven't backed up? You might be in luck. Try booting Knoppix and mount the Windows partition(s) read-only. Then transfer the data to another machine via the network.

If Windows won't boot, sometimes you can find the Windows partition with Knoppix, and then use grub to boot.

Internet access just took a dump on your windows machine, and in order to restore internet access you must download a file from the Internet. No problem. Boot Linux. Mount a partition read/write, and download the file.

Have you ever noticed that installing certain network cards on a Windows computer is like pulling teeth? To determine whether the problem is software or hardware, boot Knoppix on the computer and see whether you can see the network card and the LAN. Knoppix has the best network card detection around. If it's software, you can redouble your efforts to configure Windows for the card. If it's hardware, you needn't waste your time.

This just scratches the surface. Every computer professional, whether they use Linux or not, should have a Knoppix CD in their toolbox.
Steve Litt is the author of the Universal Troubleshooting Process courseware.   Steve can be reached atSteve Litt's email address.

Electronic Contact Lubrication

By Steve Litt
Dirty or corroded electronic contacts are a frequent cause of computer problems, especially intermittent problems. By using a good electronic contact cleaner/lubricant you can prevent many such problems from ever happening. These days I use Lube Job Electronics Lubricant, available from www.blowoff.com. Every time reinsert a daughtercard or ramstick, every time I reconnect a IDE or floppy cable, every time I connect a mouse, keyboard or network cable, I carefully apply Lube Job. Since I've lubricated electronic contacts, I've found a noticible decrease in computer problems, especially intermittents.

Electronic contact lubrication isn't perfect. You must make sure not to spread it beyond the metal contact (or the plastic housing those contacts). It's possible that some electronic lubricants have some degree of conductivity, which could alter the functioning of your computer. It's also possible that some lubricants could hurt some plastics used in a computer. All I can say is I've been very happy with the performance of Lube Job in my fleet of computers.
Steve Litt is the author of the Universal Troubleshooting Process courseware.   Steve can be reached atSteve Litt's email address.

Letters to the Editor

All letters become the property of the publisher (Steve Litt), and may be edited for clarity or brevity. We especially welcome additions, clarifications, corrections or flames from vendors whose products have been reviewed in this magazine. We reserve the right to not publish letters we deem in bad taste (bad language, obscenity, hate, lewd, violence, etc.).
Submit letters to the editor to Steve Litt's email address, and be sure the subject reads "Letter to the Editor". We regret that we cannot return your letter, so please make a copy of it for future reference.

How to Submit an Article

We anticipate two to five articles per issue, with issues coming out monthly. We look for articles that pertain to the Troubleshooting Process, or articles on tools, equipment or systems with a Troubleshooting slant. This can be done as an essay, with humor, with a case study, or some other literary device. A Troubleshooting poem would be nice. Submissions may mention a specific product, but must be useful without the purchase of that product. Content must greatly overpower advertising. Submissions should be between 250 and 2000 words long.

Any article submitted to Troubleshooting Professional Magazine must be licensed with the Open Publication License, which you can view at http://opencontent.org/openpub/. At your option you may elect the option to prohibit substantive modifications. However, in order to publish your article in Troubleshooting Professional Magazine, you must decline the option to prohibit commercial use, because Troubleshooting Professional Magazine is a commercial publication.

Obviously, you must be the copyright holder and must be legally able to so license the article. We do not currently pay for articles.

Troubleshooters.Com reserves the right to edit any submission for clarity or brevity, within the scope of the Open Publication License. If you elect to prohibit substantive modifications, we may elect to place editors notes outside of your material, or reject the submission, or send it back for modification. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com. Authors: please understand we can't place hyperlinks inside articles. If we did, only the first article would be read, and we can't place every article first.

Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):

Copyright (c) 2001 by <your name>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, version  Draft v1.0, 8 June 1999 (Available at http://www.troubleshooters.com/openpub04.txt/ (wordwrapped for readability at http://www.troubleshooters.com/openpub04_wrapped.txt). The latest version is presently available at  http://www.opencontent.org/openpub/).

Open Publication License Option A [ is | is not] elected, so this document [may | may not] be modified. Option B is not elected, so this material may be published for commercial purposes.

After that paragraph, write the title, text of the article, and a two sentence description of the author.

Why not Draft v1.0, 8 June 1999 OR LATER

The Open Publication License recommends using the word "or later" to describe the version of the license. That is unacceptable for Troubleshooting Professional Magazine because we do not know the provisions of that newer version, so it makes no sense to commit to it. We all hope later versions will be better, but there's always a chance that leadership will change. We cannot take the chance that the disclaimer of warranty will be dropped in a later version.
 

Trademarks

All trademarks are the property of their respective owners. Troubleshooters.Com (R) is a registered trademark of Steve Litt.
 

URLs Mentioned in this Issue