The Hand-Me-Down PC - Computer RepairTroubleshooting and Repairing PC's |
|
|
This book has been replaced in most usages by my latest book:
PC Repair with diagnostic flowcharts. Click here to get there:-) Copyright 2005, 1996 by Morris Rosenthal The whole premise of this book is that you don't have to be an engineer to work on your PC. This remains true, and if your PC is acting up, you can proceed directly to the troubleshooting chapter. However, many people will want to know a little more about why there are so many different generations of adapters, or how come memory bank size differs with SIMMs and CPUs. The answer requires a quick discussion of how the motherboard connects the CPU with everything else in the system box. None of this information is actually required for fixing your PC, although it can help you understand what you're doing when you order parts. We also present a quick tutorial on the operating systems startup files, AUTOEXEC.BAT and CONFIG.SYS, because incorrectly modified startup files can mimic some hardware problems. Computer BussesA computer bus is a path along which information is passed. Busses are digital, which means the flow of information is not continuous, but moves along in lock step with a clock. The bus speed is not the only measure of performance associated with the flow of information on the bus, because some devices do not respond to requests for several clock cycles. For example, most motherboards are equipped with external cache memory, which is about four times faster than the memory SIMMs you install. The external cache memory is also much more expensive, so most computers are equipped with much less than 1MB of the stuff. When the CPU wants to get some information from memory, it places the address (the location of the byte(s) to be retrieved) on the address bus, and waits for the information to come back on the data bus. If the information has been stashed in the external cache by the cache controller, it is placed on the data bus, and the CPU reads it. If the information is not in the external cache, several bus cycles (clock ticks) may go by before the slower main memory can supply the information. CPU Speeds and Bus SizesThe data and address busses are accessed directly by the CPU, and share the CPU clock. This means that bus clock speed for a 386SX-16 is 16MHz, a 486DX-33 bus clock is 33MHz, and a 486DX-50 bus clock is 50MHz. These are all examples of traditional, one speed CPUs. Many hand-me-downs PCs will come equipped with a 486DX2-50, 486DX2-66 or 486DX2-80. These are clock doubled chips, whose internal circuitry runs at twice the speed of the CPU clock input. This means that even though the CPU is running at 50MHz, in the case of the 486DX2-50, it can only communicate with the outside world at 25MHz. 486DX CPUs all come equipped with a small amount of internal cache memory that allows them to blaze along at full speed as long as the information they seek is in the cache. If they need to look to external cache, or main memory, they have to sit and wait. The 486DX4-100 and 486DX4-120 are actually clock triplers, despite the "4" designation. These CPUs have internal circuitry that runs at three time the external clock speed. The address and data bus are also sized according to the CPUs capacity. The 286 CPU (PC-AT) and the 386SX CPUs both see the outside world in sixteen bits. With each bus cycle (clock tick), the 286 or 386SX can exchange sixteen bits (two bytes) of information with memory, or the bus controller. The 386SX is internally a thirty-two bit CPU, the same as the 386DX and 486 CPUs, but is restricted in it's interface. The 386DX, 486SX and 486DX CPUs are all full, thirty-two bit CPUs. They process information internally, and exchanged it with the outside world, thirty-two bits at a time. The 386DX had no internal cache and no math co-processor on board, while the 486SX lacked only the math coprocessor. The new generation of CPUs from industry leader Intel are the Pentiums, which are full sixty-four bit CPUs. These CPUs, with clock speeds ranging from 60MHz to 300MHz (Pentium Pro), can access twice as much information from memory in one cycle as the thirty-two bit CPUs. Because their clock speed is so much faster then the memory can respond, these CPUs require multiple levels of high speed cache memory to perform at their optimum. Memory Banks and DMA (Direct Memory Access)We discussed in the section about upgrading memory how there are two physically different sizes of SIMMs, with each size coming in different capacities. The SIMMs that were referred to as 30pin SIMMs, are also called "eight bit" SIMMs, because they store information in a format that is eight bits wide. Now that we know that the 286 and 386SX CPUs see the world sixteen bits at a time, we can see why the memory bank size used with these CPUs is two 30 pin SIMMs. It takes 2 eight bit SIMMs, to create a memory bank sixteen bits wide. By the same token, the number of 30pin SIMMs required for a 386DX or 486 CPU memory bank is four. It takes 4 eight bit SIMMs to build a bank thirty-two bits wide. The 72pin SIMMs that were introduce a few years ago are thirty-two bits wide. This means that only one 72pin SIMM is needed to create a memory bank for a 486 CPU. The Pentium processors, which see the world in sixty-four bit clumps, require two 72pin SIMMs in each memory bank. It takes 2 thirty-two bit SIMMs to build a memory bank sixty-four bits wide. The newer DIMMs are 72 bits wide, so it takes only one DIMM to make a bank on a Pentium. In the oldest computers, all of the memory management was handled by the CPU. If information the CPU needed was on the hard drive, a software routine (special housekeeping program) would negotiate with the disk controller and accept the information via the bus controller, and store it in memory for use. Early on, computer designers saw the value of giving adapters direct access to the memory, to free the CPU from performing this rote chore. The DMA controller is a fairly brainy chip that arbitrates memory access, and allows adapters to off-load their information into memory while the CPU is busy doing other things. DMA is a two way street, and the CPU can also instruct the DMA controller to transfer blocks of memory to an adapter, to be written to a hard drive or turned into music by a sound card. The I/O (Input/Output) BusThe adapters that add so much functionality to the PC are plugged directly into I/O (Input/Output) bus. This bus has a slower clock than the memory and address busses, no higher than 8 megahertz in older PCs. This creates a situation similar to a traffic jam caused by highway construction. If you were the only car on the highway, you could buzz through without hitting the brakes, but when you add a volume of cars, everybody ends up crawling for fifteen minutes. Devices that are attached to the I/O bus may be capable of supplying information at a high speed, and the CPU and memory can certainly work with information at a high speed, but the I/O bus creates a bottle neck that drags everything down to the least common denominator, the I/O bus speed. Computer designers have always been aware of this issue, and have developed a whole slew of work-arounds. The first approach is to simply make the bus wider, the equivalent of adding more lanes to the highway. The bus in the original PC was eight bits wide, meaning it could pass along eight bits (one byte) on data with each bus cycle (clock tick). The next generation of bus, introduced in the IBM PC AT, had a sixteen bit wide bus known as the ISA (Industry Standard Architecture) , which passes along twice as much information at the same clock speed. When IBM temporarily pulled out of what had become the clone business, the main selling feature of their new, proprietary PS/2, was its superior Microchannel I/O bus. The clone industry, needing to compete with Big Blue, created the EISA (Extended Industry Standard Architecture) bus, for use in some 386 PCs. The EISA bus increased the clock speed up to 10MHz, and the bus width to thirty-two bits. The EISA bus was cleverly designed so that the adapter cards themselves could take over the bus (bus mastering) when they had a lot of information to transfer. Another feature of the EISA bus was backwards compatibility with old adapter cards, which could be used in EISA bus slots. This trick was worked by making the EISA bus connectors twice as deep as normal connectors, so that the old ISA adapters didn't reach the extended set of contacts. The EISA bus suffered from two drawbacks. The first was the expense, since special chips to control the bus, all of the extra data pathways, and the fancy slots, combined to add hundreds of dollars to the cost of a motherboard. Special software was also required to install the new adapters, which themselves cost many times as much as ISA cards. The EISA bus may have overcome these difficulties to become the dominant architecture, if not for the introduction of the VESA local bus. The philosophy of the VESA local bus was simple. Add two or three slots very near the CPU, and let the CPU read and write information directly to them at the same speed as memory. Implementing this is actually a little trickier than it sounds, because there are limitations on how much electrical power and noise the CPU can handle, which led to the limitation on the number of slots. However, the net result was a thirty-two bit wide bus that operated at the CPU clock speed, when that speed was 33 MHz or under. VESA adapters plug into the regular slots in the ISA bus section of the motherboard for power, but have an extended contact area that plugs into the in-line VESA slot near the CPU. With the advent of Pentium systems, the PCI bus was introduced. The PCI bus is another thirty- two bit bus, which runs at a fixed clock of 33MHz and supports bus mastering. PCI adapters can transfer data directly to system memory at this clock rate. PCI adapters require their own special slots, which offer no backwards compatibility to ISA family adapters. The PCI bus is a more traditional bus implementation than the VESA local bus, employing full buffering (insulation of the I/O bus from other busses on the motherboard) and sophisticated bus controllers. ISA and EISA busses are isolated from the CPUs memory bus by a bus controller. The VESA local bus shares the memory bus with the CPU for one or two adapters, but the rest are again isolated by a bus controller. Implementation of the PCI bus requires two levels of controllers, a PCI controller which isolates the CPU from the PCI bus, and a ISA or EISA bus controller that buffers the PCI bus from the (E)ISA bus. InterruptsAdapters on the I/O bus and other devices with direct connections to the motherboard all have the ability to generate interrupts. A hardware interrupt is an electrical signal generated by the adapter or device that is wired directly to the interrupt controller. The interrupt controller buffers and prioritizes interrupt requests, then notifies the CPU with another direct wired connection. The CPU then jumps to a memory address associated with the interrupt number, and executes the software routine stored there, known as an "interrupt handler." Without these interrupts, the CPU would never know when you move the mouse, type a new letter, or when the hard drive has found the requested information. Interrupts are written in shorthand as IRQs (Interrupt ReQuests). Early motherboards (pre-AT) supported only eight interrupts, new motherboards support sixteen. The following table presents the mapping of interrupts as they will be set in most systems.
Although the table shows six free IRQs, they can get used up in a hurry. IRQ 5 is commonly gobbled up by a sound card, and IRQ 9 is used by most SVGA adapters. If any adapter we might want to add could use any of the remaining interrupts, we would be OK, but adapters usually offer a limited number of choices. IRQs 3 and 4 can be shared by two communications ports, or a communications port and a modem, but only if the devices sharing the interrupt won't be functioning at the same time. In other words, a mouse on Com1 (IRQ 4), precludes the use of Com3, because the mouse must be available at all times. Interrupt conflicts are one of the most common problems run into during upgrades. Trouble Shooting and Repairing Clone PCsThe only tool you ever need to fix most PCs is a Philips head screwdriver. The first question that jumps into a hobbyists head is "How can I fix anything without my soldering iron." The answer is simple. You find the bad part, and replace it. If a part is under warranty, and many expensive adapters and drives are backed by the manufacturer for five years, you can pay for shipping and wait for a replacement re- manufactered part. Some computer components actually come with lifetime warranties, based on the idea that you'll never find the paperwork again, and even if you do, the part will be so obsolete you won't want another one anyway. Most computer parts employ surface mount circuit boards that are assembled by robots. It takes a microscope, micro manipulators, and a highly trained technician working in a hermetically sealed clean room to repair these components. Quality control is unfortunately a myth in large segments of the PC industry. I once installed a motherboard and powered it up, only to hear a loud pop, and smell the aroma of burnt electronics. I was quite irritated with myself, certain that I'd let a stray screw short out the motherboard, but a colleague who had more faith in me insisted we examine the board for damage. I quickly found the blown chip, a part of the buffering circuitry, and noticed that it had been soldered to the motherboard backwards! This motherboard had quality control stickers on it from at least three different "inspectors", yet it had obviously never been powered up. Even at the low wages paid in the Pacific rim, where most of these components are manufactured, competition is fierce, and everyone cuts corners. A SIDE adapter which sells for $10 here in the wholesale market, has a couple dollars of chips on it, and had to buy a long boat ride to boot. There's just no margin in the commodity adapter business, and quality suffers the consequence. Electronic devices have a life span that engineers refer to as the "bathtub curve." Lots of parts fail with the first jolt of electrical current, followed by heat induced failures as they warm up for the first time. Components that weren't manufactured quite as well as they could have been fail next, followed by a long period of relative stability. Near the end of the expected lifetime of the part (how long it was designed to last for) failures begin to increase once again. Components with moving parts, like hard drives, have much shorter lives than purely electronic parts, because bearing surfaces inevitably wear down, and they are more vulnerable to jolts when you pound the computer in frustration. All PC vendors promise a "burn in" period, 24 to 72 hours of diagnostics running on the system, the idea being to get it by the front end of the bathtub curve before they sell it. Some actually do this, many don't. I've personally fixed many DOA (Dead On Arrival) PCs from brand name vendors, where parts simply weren't hooked up or correctly configured. This has become epidemic as suppliers standardize on models, because they install all of the software by simply putting in a hard drive that's been copied from a master version of the identical system. This leads to lots of misconections of floppy drives, CDs, sound cards, etc. A company I worked for once submitted a "standard" PC to a local city, which was bidding a blanket contract for PC purchasing. While I was there, a city employee asked me to take a quick look at a competitors PC, in which the floppies weren't working. I opened up the system box, I saw that they hadn't been hooked up, neither power connectors nor ribbon cables. We won the contract, but I tell the story to illustrate how a reputable PC company was able to screw up on something so simple when they were trying to make an impression. Imagine how their PCs worked when they didn't care! How to repair a computerYou probably already know a lot more about troubleshooting a PC than you think you do. Many failures are pretty obvious, or are actually reported by the system. For example, if the letters you type on the keyboard don't show up on the screen, the problem could be the keyboard, or the connection between the keyboard and the system box. Hopefully, these are things it you would check before picking up this book. The system might report "HDD/FDD Controller failure" when you power up. It's telling you that it can't communicate with the SIDE adapter. The real wild card in troubleshooting is the software. Poor programming, applications used in ways the programmer didn't consider, or unexpected software events can cause a PC to freeze. It could also be any number of hardware problems, but the best approach is to troubleshoot the software first. The way to do this is to get the problem to repeat by trying the same procedure, or to avoid using the same software for awhile and see if the problem goes away. Either way, it doesn't mean that you don't have a hardware problem that the software is triggering, but it all helps to narrow down the possibilities. Software related freezes are often related to improperly configured drivers for peripherals, which should be apparent from the failure occurring when you attempt to access them. Power SuppliesComputer power supplies are among the weaker links in clones. Power supplies suffer from three basic failure modes: total failure, fan failure and loss of voltage regulation. Total failure means that no beeps or blinking lights result when the switch is turned on and the fan on the powers supply doesn't rotate. There are a few very basic items that should always be checked first, particularly on PCs that are being installed in a new location for the first time. 1) Make sure the outlet the computer is plugged into is alive by plugging in a lamp. While house wiring is usually reliable, inexpensive power strips sporting from four to eight outlets often have at least one dead, or unreliable outlet. 2) Make sure that the PC end of the power cord is firmly inserted into the power supply. These cable ends are an industry standard female connector which is pushed onto three prongs. Sometimes, the cord you are using is not the cord that came with the PC and the fit is very tight. If you aren't confident that contact is being made, you should try another cord. 3) Check the line voltage switch (110/220 V) on the back of the power supply to make sure it is set on 110V. This problem rarely comes up anymore because switches are small and taped over, but older power supplies had switches that could be accidentally thrown during transport. Once you have confirmed that the power supply is plugged in properly, there are two further items that can be checked if you have a voltage/continuity meter. A broken power supply switch on the front panel of the system box is probably the second most common mode of power supply failure, after fan problems. Switch problems are sometimes obvious, like a push button failing to stay in, but more often you will need to check it with a meter. With the power supply unplugged from the wall, you can remove two of the four connections from the switch and check for continuity with the switch turned on. Then replace the two connectors and check the other pair. The wires chosen for each pair should be on opposite sides of the switch, where the terminals on a side should not show continuity with the switch in either position. Power supplies do have fuses inside the housing that can be checked with a meter, but I've never come across a blown one. There a large capacitors in the power supply, so even with the power cord detached, you can get a bad shock. The audible noise from the power supply is normally caused by the fan. A steady squeal or loud hum is caused by failing fan bearings. These noises may come and go in accordance with room temperature, humidity, and other environmental factors. If this problem is present and the PC is still in manufacturers warranty, change out the supply immediately because it's not going to fix itself. Power supplies can last a long time with noisy fans, but failure of the fan can damage more than just the power supply, which will overheat and malfunction when the fan stops. If you have a noisy power supply fan, and you don't want to purchase a new power supply, you can buy a replacement fan and change it out. Before you do so, try vacuuming the power supply out through the fan grating, since a buildup of dust and thread may be the problem. The other key noise produced by the power supply fan is the pitch of the normal droning sound it makes. If the pitch drops greatly when the drives are being accessed, the power supply is not very healthy, and should be returned if in warrantee. Power supplies will sometimes appear to be dead if one of the main system components has developed a short circuit. This may be determined by alternately isolating components from the power supply. Unplug all of the power leads to the drives, and try to boot the system. If the system comes to life, shut down, and begin plugging the drives back in and rechecking, one at a time, until you find the problem drive. Otherwise, remove the power connectors to the motherboard and try powering up again. Sometimes, one of the power leads to the drives will fail, but in most instances you can add a power splitter to a working lead and continue using the supply. When replacing a dead power supply, make sure the replacement is of an identical form, i.e. physical size of supply, location of switch, and connector on switch if it is the front panel push button type. If you don't mind the work of moving all of your parts around, a replacement system box, complete with power supply, is usually cheaper than purchasing a power supply by itself. One of the few universal truths in the world of clones is the color coding of the motherboard connectors, P8 and P9. Connect these to the motherboard such the black leads in either connector are adjacent to one another in the middle of the connection block. Power supplies can also be responsible for a variety of odd-ball failures. If a machine reboots itself when the table is jarred, or when someone walks across the room, there is a good chance that the short is in the power supply and not elsewhere in the system. Systems which occasionally freeze up and don't want to power up for a few minutes after being shut off may have a power supply problem, the other likelihood being an overheated CPU. Systems that boot and run for a very short period of time before freezing may also be blamed on poor power supply voltage regulation. A power supply with a faulty ground can cause strange problems, particularly with drives that use the frame of the system box for ground. If a known good floppy or tape drive experiences consistent failures in a system, try de-mounting it and running it insulated from the case by a book or static proof bag. In addition, power supplies can also be afflicted by whistling capacitors, which produce a high pitched tone that can irritate the dog and the children. You should try temporarily disconnecting any hard drives and removing a modem, if one is present, before fixing the blame for the sound. High frequency sound is highly directional, so reorienting the system box in a room or on a desk may save you from having to replace the power supply. KeyboardsKeyboards are pretty reliable on the whole, given that they are one of the least expensive and lowest-tech parts of a system. Stuck keys, keys that repeat (bounce), and keys that just don't work at all are pretty easy to spot, and the problem normally gets worse in a hurry. Keyboards do have a total failure mode, which occurs when the system hangs on booting and displays a keyboard error message. If the system is being set up at a new location for the first time, make sure that the keyboard is plugged in firmly, and that a book or your elbow isn't resting on the keypad. Older keyboards had a switch on the bottom for selecting installation on XT/AT computers. Old XT keyboards can't be used with newer systems at all. Sometimes, a keyboard will have its encoder circuitry blown by a static electrical shock, or just be mechanically ruined from one too many spilled cans of Coke. Obvious failures such as these can be easily diagnosed by swapping keyboards. Some combinations of EEKs' (Extremely Enhanced Keyboards) and the computer BIOS will always produce a boot time error unless the "keyboard testing" option is turned off in the CMOS setup. Any keyboard going beyond the 101 key scheme can be considered an EEK. More insidious problems can arise from an intermittent keyboard failure. Sometimes a keyboard failure will suggest a problem with the keyboard controller on the motherboard, often called the keyboard BIOS chip. Some keyboards, due to capacitive buildup, or heat related failure, will cease to work while the computer is in use, giving the user the impression that the system has "hung." A simple test to see if the system is locked up or the problem is in the keyboard input is to check if the mouse still works. If you can make anything happen with the mouse beyond simply moving it, like pull down a menus in a Windows program, neither the software or general system hardware can be faulted. If using another keyboard doesn't fix the problem the failure is with the keyboard BIOS (controller) chip. The keyboard BIOS chip is a large DIP (Dual Inline Package) chip, with twenty pins on each side of a package about 2" long by 3/8" wide. Keyboard BIOS chips are pretty compatible in older systems, and were normally mounted in sockets, rather than soldered to the motherboard. Before you hunt around for a replacement chip, try removing and re-seating the existing chip in the socket. This sometimes remakes a connection that has been oxidized or otherwise mechanically failed. You can take a keyboard BIOS chip, identified by a label or the letters KB appearing somewhere on the top, off any old motherboard and try it in your system. Just make sure to line up the key, a small notch in the top of one end of the chip, with the similar key in the socket. Also, while you want to straighten the legs on the replacement BIOS chip before trying to insert it in the motherboard, avoid bending any leg more than once or twice, because the thin metal fatigues quickly and will break off flush with the package. On newer PCs, the Gate-A20 feature used in memory addressing schemes is included on the keyboard controller chip. If this fails, the BIOS should generate a non-fatal error at boot time and inform you of the problem. Other error messages related to the keyboard include: "Keyboard Error", "Keyboard Interface Error" and "Keyboard is Locked." "Keyboard Error" appears when the keyboard BIOS doesn't recognize the keyboard type, like in the case of the EEKs mentioned above. "Keyboard Interface Error" means the BIOS believes the keyboard connector on the motherboard to be faulty. "Keyboard Locked" occurs when the circular keyhole on the front of the CPU has been turned to the locked position, or the lock lead hasn't been attached to the pins on the motherboard, and a jumper is required. MotherboardsToday's motherboards use a lot of surface mount technology, and could practically be labeled "No User Serviceable Parts". For this reason, many of the beep codes generated by the motherboard during POST (Power On Self Test) are just an anachronism. For example, one beep informs us that the memory refresh circuitry is dead, but there's nothing we can do to replace it. The two most commonly encountered beep codes that remain useful are for the memory and the video adapter. Three beeps, normally slow, indicates that there is a failure in the first bank of ram installed. Reasons can include a poorly seated SIMM, a dead SIMM, or unsupported SIMM types installed in bank zero. If you have more than one bank of memory installed, try swapping out a higher bank for bank zero. Eight fast beeps means either the VGA adapter is not present, or the memory on the adapter has failed. With newly installed VGA adapters, or systems that have been recently worked on, the most common reason for this failure is poor seating of the adapter in the slot. Most motherboards have the system BIOS, keyboard BIOS, cache memory, main memory and the clock crystal socketed. An on-board battery with some type of replacement option is used to power the permanent CMOS memory and clock/calendar. The system BIOS, which we have broadly referred to throughout the text, is stored on a ROM (Read Only Memory) chip, which is a permanent, if slow, memory storage device. Newer motherboards may store the BIOS code in flash memory chips, which can be upgrade in place, using software downloaded from the Internet or provided on floppy disk. The system BIOS is motherboard specific, which means you can't arbitrarily take a system BIOS chip off a dead motherboard and use it in your PC. A bad crystal will very rarely cause boot failure or intermittent lock-ups, and should be swapped only before replacing the CPU. On new systems with ZIF (Zero Insertion Force) socket, I'd try swapping the CPU first! A bad BIOS chip, system or keyboard, can often be visually identified by a burnt spot on the label over the center of the chip, since their normal failure mechanism is excessive heat. System BIOS chips fail even less often then keyboard controllers, and are not suspect in intermittent failures. Problems with new built systems can sometimes be caused by partially inserted CPUs, especially as the larger chips can be difficult to insert. Diagnostic failures involving address lines are an indication of this problem, which is easy to spot with the motherboard removed from the case. The most common problem related to the motherboard that occurs during the upgrading or repair process is inadvertent loss of the CMOS settings. A brief short on the battery backup circuit or too much flexing of the motherboard can cause this problem. When the system is booted, a "CMOS Checksum Error" will be generated. Enter CMOS setup and restore all of the settings for your floppy drives, hard drives, date and time. Other CMOS related errors like "CMOS Display Type Doesn't Match" or "CMOS Memory Size Mismatch" are the normal results of some upgrades, and are corrected by entering CMOS setup and saving the new settings, which the BIOS will automatically generate. Another common motherboard related problem is mechanical connector failure. This can occur in the bus slots, with the keyboard connector, or with any of the ribbon connectors that may be attached to an enhanced motherboard. Failure in bus slots is normally due to one of the gold plated "fingers", that make spring loaded electrical contact with the adapter, being crushed inwards or pulled outwards. Careful visual inspection can detect this problem, which is often repairable by carefully bending the finger back into position. The easiest fix is to simply try the adapter in another slot, but first check to make sure that the bottom edge of the adapter has been nicely beveled to make the insertion smooth. This is sometimes not done on really cheap adapter cards, but you can take a little off the edge with a file. Be gentle, so as not to cause the gold contact to lift from the adapter surface, which ruins the part. The mounting of keyboard connectors sometimes fails so that the connector rocks back on the motherboard, preventing proper insertion of the keyboard lead or breaking a contact point. This is also noticeable by visual inspection, and occasionally repairable with a little imagination. Ribbon connectors don't fail, but the ribbons are often connected wrong. Double check that all of the pins have entered the connector, since the ribbons will go on missing an entire row of pins, without having to be forced. Also check that the red, or other specially marked wire in the ribbon cable, goes to the pin 1 and 2 end of the connector at both ends of its length. I've often been fooled when installing new drives or controllers by ribbon cables that were attached backwards at both ends of the connector, and therefore worked properly until I came along and changed one end. BatteryThe usual harbinger of a failing battery is a "CMOS battery state low" being displayed by the BIOS at boot time. Often, this message appears only after some of the settings have been lost from the battery backed CMOS memory, but the real time clock (date and time) is usually still functioning. Schemes for battery backup of the CMOS settings abound, with the most common one on older motherboards being a soldered, rechargeable battery on the motherboard. Newer systems often employ a socketed, encapsulated battery, which is sometimes held in place by a tie wrap. In all cases, the onboard battery can be replaced with one of the same kind, or it can be disabled so you can substitute an external battery or battery pack. Replacement batteries come in a great range of sizes and voltages (from 3V to 6V), so if you should try to determine from the old battery or motherboard documentation what voltage is required. When replacing a battery of the first type with an external, you often have to move a jumper on a 3- pin block the motherboard from "internal" to "external". If no documentation is available, this jumper block is normally the one closest to the battery connection. It sometimes takes the form of a 2-pin block to be jumpered or left open. You must find and set this jumper on most motherboards, to enable the four pin connector for external battery connection. Replacing an onboard battery with one of the same type requires no changes in settings. External batteries come equipped with a four pin hole connector, where one hole is usually plugged with a blank to key the plus terminal of the battery to the correct side of the motherboard connector. If no blank is present, or if there are four pins present on the external battery connection pad, you have two choices. You can work the blank out of the connector pin hole, then connect the red wire side of the connector to the "+" side of the connector block, or you can bend the interfering pin out of the way. If a replacement battery fails in a few days or weeks, there's a good chance the fault was in the battery and not the motherboard. If another replacement fails, you can try a higher voltage, or try moving a different jumper if your system board was undocumented. Finally, you can try cutting a lead on the onboard battery, and re-mounting the motherboard to ensure that there are no electrical shorts on the bottom. If the problem persists, consider trying to live with it by simply re-entering the system configuration information in the CMOS setup each time you boot. With one hard drive, one floppy, and the date and time, this actually takes less than a minute in most cases. Your only other options are to leave the machine on all of the time, or upgrade the motherboard. If you do go this route, and have a 486 whose performance you were happy with, you can pick up a motherboard for under $50, and use your old CPU and memory. Cache MemoryCache memory can be responsible for intermittent lockups, conflicts with software caching programs, and system lockups immediately after or during the boot process. Many cache problem will be recognized by the BIOS at boot time, and the message "Cache Memory Bad" will appear. The system will freeze at this point, and you will have to reboot, enter CMOS setup, and select "Disable External Cache" in one of the advanced menus. This is also the best way to quickly determine if a problem with the external cache is responsible for run-time lockups. If the system functions properly with the external cache disabled, the cache is indeed the problem. External cache chips are DIP (Dual Inline Package) chips, like the keyboard BIOS only much smaller. They come in a variety of sizes, depending on capacity, and most motherboards support at least two types. All external cache systems utilize a "tag ram" chip, which serves as an index to the data chips, and is often a different speed or type. If this is a newly built system, the most likely problem will be improper jumper settings, as there are often five or six jumper to set determining total cache size and chip type. This requires the motherboard manual or a lot of guessing. The other likelihood is a leg bending under a chip, or missing the socket completely. The latter is easy to spot, while the former may require taking the chips out and inspecting them. Like all DIP chips, the metal legs fatigue rapidly with bending, and will only tolerate being straightened out once or twice before breaking off. If after determining that all the chips have been inserted properly, which includes lining up the key (indentation) on the top of the chip with the key in the socket, the only option is to try replacing chips. If you have 256K cache installed, using eight data chips, you will be able to set the jumpers for 128K cache and use the extra four chips for troubleshooting. If not, you will have to obtain at least one new chip, two if the tag ram is different, before proceeding. Try to replacing the single tag ram chip first, since it's often the problem. After that, you can replace the four data chips one at a time to isolate the bad egg, or all at once, if you have extra cache on hand. Some software caching programs, particularly those use by some of the older and off brand network operating systems, will not operate with external cache enabled. Try getting a software upgrade or live with the external cache disabled. Cache chips don't follow any standard nomenclature, so here is a brief list to help you figure out what you have. All chips are presented as 20nS versions, they may exist as 15nS or 25nS chips as well.
Internal Cache MemoryThis is the on processor cache on 486 and newer CPUs. 486 CPUs, up to the DX2-66 model, came with 8K of onboard cache. Newer 486s and Pentiums come with 16K of onboard cache. The cache memory is actually the single greatest use of real estate on the chip. When a system is taken out of turbo mode, along with a reduction in clock speed, the internal cache is temporarily disabled. This effectively slows the system down to something like the old IBM PC-AT speed. If the internal cache needs to be disabled for the system to function, it's time to get a new CPU, because the performance will be terrible. Accidentally disabled internal cache is the most common problem with PCs that have been upgraded, yet perform worse than the old motherboard/CPU combination. Main MemoryFailures of main memory are the most common cause of intermittent lockups, especially when those lockups occur only in specific programs or only in the programs which are using of extended memory. A machine that was used for simple DOS applications for many years might begin experiencing memory errors when Windows is installed, simply because it wasn't using those memory locations before. The easiest way to determine if intermittent problems are due to memory, is to run a diagnostic program that does a slow memory check (the test should take at least a minute or two per megabyte). If you don't have a diagnostic, and don't have Internet access to download one, you'll have to find SIMMs to trade out, or swap banks. SIMMs should be re-seated in their sockets and re-tested before being written off. In some instances, inserting a wait state in the advanced CMOS setup will solve the problem, at least temporarily, and the system should then pass the diagnostic. Memory within a bank on the motherboard (two 30pin SIMMs on 386SX, four 30pin SIMMs or one 72pin SIMM on 486s, and two 72pin SIMMs on Pentiums) should be kept all the same type of SIMM, speed, and brand. Older SIMMs were all of the Fast Page RAM type, but many newer, 72pin SIMMs are of the EDO (Extended Data Out) variety, which in some case will double memory access speed. The vast majority of bad SIMMs I've found in PCs either have a mix of chip speeds and brands mounted on the SIMM, or have been mixed in a bank with dissimilar SIMMs. The hobbyist or home user may be willing to gamble on changing a single SIMM, but in field repairs, the whole bank should be changed. Old unmatched SIMMs are handy for troubleshooting, or can be used on caching controllers and some high end graphics rendering adapters. Main memory problems are reported by the motherboard three ways. The repeated code of three slow beeps at power up, indicates a dead SIMM in bank zero. A "CMOS Memory Mismatch" error means you have added or changed the SIMMs in the system, or the SIMMs in a bank other then bank zero are no longer recognized. An "On Board Parity Error" means that there is a chip failure on one of the SIMMs, or you have just installed non-parity SIMMs in a parity system. The options are to replace the SIMMs in the problem bank with parity SIMMs, or to disable parity checking in the CMOS setup. Disabling parity, if this is an option, won't reduce data integrity if you've purchased non-parity SIMMs. If you have parity SIMMs in the system and you disable parity, you should definitely run a diagnostic to test memory. Often times, the failure is actually due to the parity chip on the SIMM, and the data chips are still O.K. Older systems were built with DIP memory socketed or soldered to the motherboard. Systems with soldered memory aren't worth any effort, those with socketed memory might be revived by re-seating all of the chips. Bad chips can be exchanged one for one with chips of the same or faster speed, it's not necessary to replace the whole bank. Most sixteen bit extended or expanded memory adapters will work in newer machines, some require software drivers, others need to have switches set specifying their start address and the total amount of memory in the computer. Some 386 computers feature a single thirty-two bit slot for one special memory adapter. These adapters are going on ten years old and have been orphaned, so don't ever invest money in one. Many of the older adapters held 2MB or 3MB of memory and can come in handy on an under powered machine. An "Off Board Parity Error" is generated by the BIOS if a parity error on an add in adapter is detected. Floppy DrivesFloppy drives are among the least reliable and most finicky components in a PC. Part of the problem stems from their low cost, $25 to $35, which just doesn't buy a lot of quality in an electromechanical device. Another problem is due to the lack of quality of the floppy disks themselves. I've often had two floppy disks out of a brand new box of ten lifetime warrantee floppies fail to format to their full capacity. The standard floppy drives, the 1.44MB 3 1/2" and the 1.2MB 5 1/4", offer backwards compatibility to the older generation, but they don't always do it very well. Along with all of their other problems, floppy drives get dirty inside, and often suffer from mechanical failures. The most common failure with 5 1/4" drives is the handle breaking off on the faceplate. Diskettes for 3 /12" drives employ a thin metal shield that occasionally gets a little bent and either sticks in the drive preventing the disk from ejecting, or comes of altogether, jamming the mechanism. The first category of floppy drive failures is those experienced in new PCs, or hand-me-down PCs that you have just upgraded. The most common problems, responsible for at least 75% of all floppy problems in this category, are misconnected ribbon cables. The bad connection can occur at the controller end of the cable, or at the 3 1/2" floppy drive. You don't have to worry about 5 1/4" drives because their old fashioned edge connector is virtually foolproof. At the controller end of the cable, the usual mistake is missing the entire bottom row of pins on the connection block. Looking down at an SIDE adapter from the top, it's nearly impossible to detect that the connector is sitting about an eighth of an inch to high. Take it off to check it, and make sure that the red stripe (or otherwise color differentiated) on the ribbon cable is attached at the connector block end numbered 1 or 2. At the 3 1/2" drive end of the cable, the problem is much tougher to detect. Since the connector block is often recessed into the back of the drive, and the drive is normally mounted in a hard to see position, I've often had to take the drive out and reconnect the cable to get it right. When the connector is pushed down on the connector block, it often bends two pins at the end of the block out of the way, but seems to seat properly. If you can get a good look at the connection block, you can often correct the problem by starting the connector at the end with the bent pins and using it to lever them back into place. Sometimes, the problem with older 3 1/2" floppy drives is simply determining which end of the connector block is the pin 1 or 2 end. I've even come across case where only the opposite end of the connector was labeled (pin 33), and you will often need to take the drive out of the system box and examine it to find the markings. A backwards ribbon cable can cause a 3 1/2" drive to eat the FAT (File Allocation Table) table on a floppy, so system disks become non-bootable. This can get extremely confusing since now the system won't boot from the floppy with the ribbon cable on either way. When building a new machine or adding a second floppy drive, if the LED stays lit on the 3.5" drive from the moment the system powers up, the ribbon cable is backwards at the drive end. The second most common problem on new machines, or machines that have been worked on, is the wrong drive type (i.e. 360K, 720K, 2.88M), or not installed be selected in the CMOS setup. A drive with the wrong CMOS type will often pass all hardware diagnostic tests and may even properly show the directory of the floppy disk, but will fail on extended reads or writes, often resulting in data loss. This problem may not be noticed on a new machine for months after delivery, if the misselected floppy drive isn't actually used. The easiest failure to diagnose is when the on-drive LED doesn't come on or the drive doesn't spin up or seek. Try changing the power cable to the drive and re-seating the ribbon cable on the drive and the controller card. If the drive still fails to respond, the problem is the drive or the controller. The controller is tested during boot time and will produce a FDD/HDD controller failure message in the case of non-intermittent problems. Floppy drives are notoriously unreliable, particularly when being use in office or school environments that have a large mix of machines with different brands and drive densities. Most strange floppy drive behavior arises from reading and writing floppies from different PCs and formatting floppy disks at lower then the maximum drive capacity. As a rule of thumb, if a problem reading and writing a disk occurs on some PCs and not others, the problem is with the disk. The compatibility of new drives with old formats came at a cost. The magnetic read/write heads on a low capacity drive are twice as wide as the heads on a high capacity drive and source twice the write current. New drives can make the adjustment for current, but the physical size of the read/write head is can't be changed. When formatting lower capacity disks, the format procedure does not match the media, which is physically different for the high and low density disks. In addition, there are always some fiscally conservative individuals who attempt to recycle low density disks in high density drives by defeating the mechanical check. This is bad practice and failures involving these disks in no way imply a problem with the drive. In cases where the drive fails to read or write a group of floppies that work fine in a sampling of other machines, the problem is with drive or controller. The easiest diagnostic is to swap out the drive or controller with one borrowed from another PC. Intermittent read/write failures may be due to the controller or even the motherboard. The former is easy to troubleshoot by swapping out the controller and the latter is very, very rare. Bad chassis ground or memory problems can also produce intermittent floppy problems, refer to power supply and memory sections. Hard DrivesModern IDE hard drives are among the more reliable system components. They typically run for years, then fail with a whimper or a bang. Most of the problems that creep up with a hard drive are actually controller or software issues. One easy way to avoid some problems and many complicaions is to avoid using any type of hard drive compression utility. They reduce overall system performance and create a nightmare for data recovery if you have a problem. The best way to increase hard drive capacity is to purchase a new hard drive. Sometimes, hard drive performance degrades as the information you are seeking gets spread all over the disk, a process called fragmentation. The main symptom is head thrashing, which causes the HDD LED (hard drive light) on the front of the system box flash rapidly for long stretches of time. You can correct the problem with the DEFRAG program, included in newer versions of DOS, or you can use an aftermarket utility. The two most common problems encountered with otherwise healthy hard drives hard drives are a lack of sufficient free space, and turning the PC off while it's in Windows or DOS applications software. As a rule of thumb, you should keep a minimum of 10% of your hard drive space free at all times. The main reason is Windows constant need for virtual memory. When Windows runs out of RAM, the SIMMs installed in your systems, it uses space on the hard drive to swap out chunks of memory. If the hard drive gets too full, the system may lock up in Windows, or files may become corrupted. Many software applications, Windows and otherwise, create temporary files on the hard drive while they are executing. These files are used for backups or for temporary storage. If the PC is turned off while in Windows or these DOS applications, the files aren't closed properly and become lost allocation units on the hard drive. Run CHKDSK on older DOS systems, or SCANDISK on with newer versions of DOS, to free up the lost space. Physical errors on IDE and SCSI drives are pretty rare, and should be easy to spot with any decent diagnostic program. SCANDISK will test that all of the locations on the drive are usable, while a diagnostic that runs a variety of hard drive tests (including butterfly read) will spot trouble with read/write head positioning. Another rare problem is when the software stored at the very beginning of the hard drive, the MBR (Master Boot Record), gets so corrupted that neither FORMAT nor FDISK can access the drive. Causing this damage is the ultimate goal of several computer viruses. Factory low level or "rescue" formatters will scrub a drive and prepare it for FDISK no matter how bad the software problems get. If you have a new hard drive that makes excessive noise, a loud hum or high pitched squeal, send it back. The noise is symptomatic of a mechanical problem. The most important thing to remember about working with IDE hard drives in older systems is to record the CMOS drive parameters. When you first get your Hand-Me-Down PC, enter CMOS setup and copy the drive parameters onto a piece of tape. Stick it to the side or the back of the system box. These parameters are sometimes incorporated in the drive label, but the person who originally built the PC may have chosen a different set of parameters, yielding a slightly smaller drive size. The parameters for LZ (Landing Zone) and WP (Write Pre-compensation Current) are not used with IDE drives. New systems come with an "Autodetect Hard Drive" option in the CMOS setup, which will restore the drive parameters automatically, and can be used to rediscover parameters for drives from older machines. SCSI drives are set up as "Not Present" in the CMOS setup and are operated through the controller BIOS. On of the most confusing issues for most people installing new hard drives is the size of a megabyte. Strange as this sounds, drive manufacturers call a megabyte one million bytes, in order to size their drives. Some CMOS setup utilities call a megabyte one million bytes, others use the actual value of 1024 kilobytes. FDISK always calls a megabyte 1024 kilobytes. The difference seems small, but the extra "24s" add up quickly. A hard drive sold a 340 megabytes will show up as 325 megabytes in FDISK. A 1080 megabyte hard drive will be labeled a 1000 megabyte drive by FDISK. Older hard drives are not worth repairing, due to the availability of faster, larger drives costing the same as the repair. The challenge with old hard drives is trying to recover data that was never backed up, before recycling them into bookends. When attempting any of these last ditch recovery attempts, have your backup media (floppy, tape, or direct computer link) connected and ready, because the drive get going once, and then not work a second time. Old drives that spin up but don't seek are often stuck in park. Tapping on the drive cover with a screwdriver handle may unstick the heads and get the drive going long enough to get the data off. Drives that hum, or display a lit LED and that don't spin up, may be suffering from failure of the permanent lubrication. Moving the PC to a warm place or even putting it in direct sunlight may get the drive going temporarily. Sometimes a drive appears to be functioning mechanically, but has had its master boot record so corrupted that disk utilities cannot access it. If an identical IDE drive is available, try booting that drive and then moving the ribbon connector to the bad drive without powering down the machine. This may give a disk recovery utility access to the data on the disk. As with all "live power" procedures, employ extreme caution and watch for falling screws! Video AdaptersProblems with video adapters are more likely to occur when a PC is being built or upgraded, than actual field failures during operation The most common problem is for the new VGA adapter to not be seated properly in the bus slot. Sometimes this will cause the POST (Power On Self Test) routine to generate an eight or nine beep code, which means the video memory can't be read. Other times the computer will seem to be booting, but the display will never come on. Re-install the video card, paying special attention that the end of the card, whether ISA or VESA, is as well seated in the front of the slot as the back section, which is held in place by the screw. In some instances, you will have to loosen the screw, or build it up with washers, because it is forcing the adapter to pivot against the edge of the bus connector, lifting the front end. The problem is most common with VESA local bus cards, since the distance of the special VESA connector from the pivot point is very long. In rare cases, a PC won't boot, or the screen won't light up, due to the mix of adapters in the bus slots. This isn't supposed to happen, and is probably due to minor timing errors, but I have seen certain VGA adapters and SIDE adapters that work fine separately in other PCs simply refuse to work together on the same bus. This is always diagnosed by removing the SIDE adapter, because you can't boot a PC without a video adapter. Another build time issue involves high end communication adapters for remote terminal support, and other cards requiring frame buffers below the 1 MB boundary. These adapters may produce memory conflicts with the VGA frame buffer area (A000 to C000). Frame buffers on these cards can be moved by switch settings or by a software utility or the EISA configuration in EISA machines. If the video adapter is well seated, and you still get a beep code, the problem is usually a blown or improperly inserted video memory chip, and this can be easily replaced if it's socketed. On most adapters with more than 256K installed, the second bank can be substituted for the first, and the first bank can be left out while testing the card. Video memory must be replaced in banks, just like system memory. Some flaky video problems, such as mouse tracks being permanently left on the screen, are due to the wrong video memory being installed on the card. This particular problem may not show up until the card is warm, and only in some applications There are several different flavors of video memory, so unless you have the original documentation, or the type of video memory required is silk screened on the circuit board, the adapter manufacturer or their WWW site will have to be contacted. All SVGA cards have a jumper or software utility for setting interlaced or non-interlaced operation, and compatibility with monitors supporting VESA modes. Windows has no way to take advantage of the new adapters functionality, unless you go through all of the configuration software (see Upgrading Video Adapters). Never set a card to VESA timing or non-interlaced mode unless you are sure the monitor can support it, or the monitor may be damaged. Failure of an application to display a certain field or text as it does on another PC, is usually due to the video driver being installed incorrectly or not at all. This problem is more likely when working at resolutions higher than standard VGA (640x480). Some cards have a jumper for use with monochrome VGA monitors, others may have a 4 or 16 shade monochrome driver. The vast majority of new systems are built with video cards that are at least downwardly compatible with VGA. When dealing with older TTL cards and pre-VGA high resolution cards, the problem is compounded, because it will be hard to find another monitor to use to check whether the problem is the adapter or the monitor. You can find all the old adapters, MGA (Mono Graphics Adapter), Hercules (720 X 350 TTL mono resolution), CGA (Color Graphics Adapter), and EGA (Extended Graphics Adapter), for sale in the $10 range, but it's really a pretty bad investment unless you're dead set on spending the absolute minimum to keep you hand-me-down running. The monitors themselves are also available as re- manufactured units, but I couldn't see buying one under any circumstance. Monitors The most common monitor problem is total failure, power status LED fails to come on. This can be due to something as simple as a blown fuse, or something serious as a dead flyback transformer or a popped CRT. Repairs on 14" and under monitors normally cost more than half of the mail order cost of a new monitor, and are backed by only a 30 or 90 day warranty. Simple problems, like blown fuses or broken switches, can also be repaired at a local appliance shop that does TV work, and you will probably get a better rate than at a monitor repair facility. Monitors over 14" may well be worth repairing, but this must be decided on a case by case basis. Radical changes in screen size or brightness may be compensated for by hidden pots, but this will normally require working on the monitor live, often with the cover off, around lethal voltages. I don't reccomend working inside the monitor case to anyone without proper training. Larger monitors may also come with a de-gaussing switch, which may clear up some slow developing display problems. Loss of a primary color that can be attributed to the monitor electronics is one of the instances in which out of warranty repair may be sensible. About half the problems I've encountered with new monitors, or PCs that have been moved and set up again, prove to be connections or electrical environment. A partially mated connector on either end of the cable can result in loss of colors or sync. A bent pin inside the connector shell can cause any problem ranging from no display to missing colors or a continually scrolling screen. The bent pin problem can be particularly irksome, because the pin may break while being straightend. If the cable is a two connector type, buy a new cable. If the cable is permanently attached to the monitor, you can re-make the adapter end using a 15pin high-density (3 row) "D shell" connector, but you will need a thin soldering iron to make the pin connections in the center row. Long cable extensions result in diminished brightness and loss of focus on VGA and higher monitors. Monitors that are placed close together will often produce scan line interference on one another, which manifests itself by a line or set of lines continually moving across the screen. Increasing monitor separation by a few feet or changing their orientation with one another will usually clear up the problem. An oscillating image or loss of a primary color may be due to the VGA card, but often as not it can be a monitor, connection or environmental problem. The most common cause for a shaky or oscillating image is the presence of an external magnetic field, such as the power supply for your inkjet, or another small transformer, in close proximity to the monitor. Occasionally, high current carrying lines in the walls or air conditioners can be the problem Troubleshooting these problems is carried out by moving the PC to another location, or experimenting with turning off some of the surrounding electrical equipment. Some of the older multisync monitors are capable of displaying all of the modes from MDA up to SVGA, so if you are upgrading an old system with a multisync monitor, you may be able to use it on a new PC. Older WYSIWYG (What You See Is What You Get) cards and monitors may be repairable by the manufacturer or by a good repair service, but keep in mind that both the card and the monitor cost over $1000.00 each when new, and are highly proprietary. Never get involved in paying for old TTL monitors (MDA, CGA, EGA). I've often run across the situation where the monitor or video adapter has failed on an old PC-XT or 286 system, and the owner just wants to get a file off of the hard drive and copied onto a floppy, before throwing the whole PC in the trash. You can do this by typing blindly, if you remember where the files are, along with their exact spelling. Otherwise, you can use the old DOS pipes symbol ">", to send the information that would normally go to the screen to a different device. For example, if you want to print the directory of your root drive, C:, you can type DIR C: > LPT1: Or, if you don't have a printer, but you do have access to another working computer with a compatible floppy drive, you can copy the directory to a floppy by typing DIR D:> A:MYFILE.TXT When the floppy drive light goes out, you can put the floppy disk in the machine with the working monitor and type TYPE A:MYFILE.TXT |MORE and the directory will show on the screen. Whether you use a printer or another system, it's a slow way to navigate through a directory structure, but you'll only have to do it once. If your PC always asks for a password on boot, or starts off in a menu or in Windows, you will have to get to the DOS prompt by blind typing, Menu exit is normally accomplished through a function key (F1 - F12), followed by "Y"or hitting "Enter" to a "Are you sure you want to exit?" question. Windows is exited by holding down the "Alt" key and hitting "F" (for File), then letting up the "Alt" key and hitting "X" (for eXit). Then hit "Enter" to pass the "Are you sure you want to exit Windows" question, and return to DOS. This is one of the times where the hard drive activity LED or a good pair of ears can give you usefull clues to your progress. Drive controller and I/O AdaptersNew PCs integrate all of the I/O functions and drive controllers on the motherboard. Most upgrade motherboards you buy will also have onboard ports and controllers. These are highly reliable, but in case of failure they can be disabled in the CMOS or with motherboard jumpers, and replaced with adapters. Most 386 and 486 systems were built with SIDE adapters, which incorporate a IDE drive interface, dual floppy controller, 2 serial ports and 1 parallel port. The default configuration for these adapters is: COM1 on interrupt 4, COM2 on interrupt 3 and LPT1 on interrupt 7. In case of total failure of the drive controllers, the BIOS will notify you at boot time with a "FDD/HDD Controller Failure". The presence of the communications and printer ports can be confirmed by the system configuration screen at boot (see the beginning of section 3). Intermittent problems with SIDE adapters are unfortunately fairly common, with hard drive boot problems leading the pack. Random system lockup while accessing the drives, or failure to consistently recognize a mouse or printer, are also common problems. SIDE cards are the least expensive components in a PC (under $15), so if you're looking for a place to start spending money on troubleshooting parts, here it is. There is nothing on the adapter itself that can be repaired, but you can make sure your connectors are on good, and you can try re-seating jumpers if a port mysteriously disappears after years of use. Some apparent SIDE failures can be attributed to conflicts with other adapters on the bus, a problem which will show up immediately after an upgrade. VESA local bus SIDE controllers fall somewhere between ISA SIDE adapters and the more expensive controllers for reliability. Local bus controllers with hardware cache are problematical, and should never be purchased. Compatibility problems with software applications, physical memory errors, and cache bottle-necking are common problems in caching controller installations. All SCSI controllers, and IDE adapters for the EISA or PCI bus, are manufactured to higher quality standards and are generally reliable. When adding a SCSI controller during an upgrade, make sure you disable the any on-board floppy controller. Make sure that the BIOS on SCSI cards is the current version, these are upgraded on a regular basis and are available from the manufacturer. EISA controllers must be configured properly using the motherboard EISA utility and configuration files supplied by the controller manufacturer. Both EISA and PCI bus controllers have software selectable interrupt and DMA settings, either directly available in the CMOS setups, or via an EISA configuration utility. Other EISA settings include support for drives over 1000 megabytes, bus transfer rate, and options for "standard" or "enhanced" operation. Enhanced operation requires an extra interrupt, and must be enabled to take full advantage of the controller. SCSI controllers with an external 25 pin connector should have the connector covered or taped over if not in use, to prevent accidental connection of a printer cable. Systems using IDE disk drives and SCSI CD-ROM's or tape drives may boot somewhat slower that usual, and the controller BIOS may produce a message about the boot device that can be misinterpreted as an error. A common problem with all high end drive controllers is susceptibility to high heat. If it is over 90 degrees Fahrenheit in the room, you can be sure it's a lot hotter inside the system box, so don't be surprised if drive errors or lock-ups occur. Network AdaptersAll new network adapters are shipped with software that includes self diagnostic capabilities. The software not only tests that the card is functional, it also reports on all the jumper settings, and allows you to change settings on jumperless versions. Network cards are interrupt driven, occupy I/O space and may employ memory mapped transfers, so are therefore subject to conflicts with other cards if not configured correctly. Network adapters have a pretty high DOA (Dead on Arrival) rate, because intense competition has driven prices so low that quality control has suffered. Most new adapters come with diagnostic LEDs on the back, which indicate the activity of the card. Nine out of ten network adapter problems will actually be caused by improper software configuration. All network adapters require some sort of driver to be installed when the network boots, and if the driver installs successfully, the adapter is probably functional. Note that the software driver will normally install with the incorrect interrupt specified, or with an interrupt conflict that may prevent reliable operation. Some network cards come with available on-board terminators. These terminators should never be used because they cause a great deal of confusion if network nodes are ever added or rearranged. Network CablingThere are more possible options for network cabling and topology than there are types of cards. All bus type topologies require a proper terminator at each end of the bus. The terminator resistance must match the characteristic impedance of the cable, with must be the correct impedance for the type of card. Incorrect network cabling may work for some time after installation, or even work consistently until nodes are added to the network. Without expensive equipment, the way to check the impedance of existing coaxial cabling is to pull a little out of the wall, or look in the crawl space or drop ceiling and read the casing. Standard coaxial cables used in networking are RG58U (50 ohms), RG59U (93 ohms) and RG62U (75ohm). The current standard for color coding NC terminators is green for 50 ohms (ethernet) and white for 93 ohms (Arcnet). The 75 ohm coaxial cable used by cable TV and some network topologies is often improperly substituted for either 50 or 93 ohm cable. These networks may limp by with a small number of nodes or over short distances, but if you find out the wrong cable is installed, have it ripped out and replaced immediately. 10BaseT networks utilize twisted pair cable with RJ45 connectors and require a concentrator, or hub. Cabling is straight through and uses only 4 wires of the eight available in a RJ45 jack (1,2,3 and 6). With both twisted pair and coaxial cable networks, the cable and connectors should be the first item checked in case of failure. Unlike small coaxial networks, twisted pair is often already in place or is installed by a separate contractor at a customer site. Intermittent network problems will most often be cause by intermittent hardware failure at the node or server, unrelated to the network. Problems can arise from cabling that is run to close to sources of intermittent electrical noise or that exceeds the segment length limits specified for the equipment. Temperature and humidity can also be factors. Coaxial LAN cabling is much more difficult to troubleshoot than 10BaseT LAN cabling, because failure of a single connector or segment on a coaxial LAN will usually "down" the whole network. Oddly enough, in the cases where a broken connection doesn't knock all the workstations off the network, it's symptomatic of a worse problem. Coaxial cable is essentially a two conductor system, with a single wire central conductor and a braided ground sheath. The information that flows on LANs is transmitted at Radio Frequencies (RF). This means that a radio wave, just like those pulled in by your car antenna, travel down the coaxial cable, trapped between the central conductor and the braided ground shield. The common Thin- Ethernet network utilizes the "T" type connectors that are fastened directly to the network adapter in the computer , to tap the RF signal off of the cable. Both ends of the cable must be terminated with a small load (terminator), which turns the remaining power of the RF wave into a tiny amount of heat, preventing it from reflecting back into the cable and setting up interference patterns. If the coaxial cable is broken, and some of the network stations continue to be able to communicate with the server, it usually means that the cable is the wrong type or many of the connections suffer from high power loss, such that the leftover RF power is getting attenuated anyway. The most common problem with coaxial cabling is bad contact between the braid and the BNC connector. Second most common are connectors where the central conductor is cut too short, leaving the contact dependent on the exact positioning of the connector and the cable behind it. Never fool around with trying to tape flaky connections into place, just replace the segment or make up a new connector. 10BaseT connectors are often made up wrong, which is easily determined by substituting a known good cable in place of the questionable length. Other 10BaseT problems include RJ45 connectors that aren't crimped tightly enough, and the wires not being inserted far enough into the connector. Looking at an RJ45 connector end-on, you should always be able to see the copper ends of all four wires in the right positions through the transparent plastic. 10BaseT hubs can also have one ore more dead ports, while the rest continue to function normally. If you have problems with a particular workstation, always try moving it to another port on the hub before suspecting the network adapter in the PC. 10BaseT wire can also have a conductor broken after being crushed by a desk or chair, even if the cable sheath remains unbroken. ModemsThere are a number of setup parameters on a modem, both hardware and software, that can prevent the modem from operating properly. Hardware setup for external modems is done on the modem, normally by an exposed switch block. Since the switches are generally not labeled and follow no convention, troubleshooting the modem requires the manufacturers documentation. The most common reasons for an external modem failing to operate are the wrong type of serial cable or a bad communications port on the PC. The only way to troubleshoot the cable is to confirm the type from the documentation or try another one. The easiest way to ensure a com port is working properly is to temporarily re-rout it to the connector the mouse works off (this means swapping ribbon cables on the SIDE card or motherboard), and try using the modem with DOS software. Hardware setup for internal modems is done with jumpers on the adapter, and requires a knowledge of how the other communications ports are set up (see Modems in section 3). Internal modems often suffer from bad documentation, the settings shown in the book don't agree with the settings printed on the adapter. Always follow the settings printed on the adapter. Many internal modems allow you to chose a non- standard communications interrupt, often IRQ 5 or IRQ 9. If you experience any problems with your modem when using these settings, change back to the communications port and interrupt pair not being used by the mouse (usually Com2, IRQ 3), and disable the existing Com2 port in the system. The first step in any modem troubleshooting scenario is to make sure that the phone line you are attempting to call out on is valid by checking it with a telephone handset. Most business phone systems will not support a plain wired modem and require a special switch or a dedicated line. Modems are more sensitive to connection quality then voice connections. If your modem often disconnects during use, fails to connect, or connects at a low baud rate, try connecting it to a different phone jack in the house. If you can hear cross-talk between separate voice lines in a house, the phone wiring probably has a bad ground. On all modems, the "Line" or "Wall" connection on the modem is for connection to the wall jack, the "Phone" connection is for connecting an optional telephone handset. The software troubleshooting procedure for both internal and external modems is basically a two step process. First, ensure that the com port address or number and interrupt are selected correctly in the software. You should be able to hear the modem go "off hook" (sounds like a phone being picked up) and attempting to dial out at this point. Second, if the modem is dialing out but not making the connection, make sure that you are using the correct parameters for the particular number you are calling, including baud rate, number of data bits, stop bits, parity etc.. The baud rate must be in the range of capability of the modem, and for external modems, within the capability of the UART (see Modems in section 3). If the modem clearly picks up the phone to dial, resulting twenty or thirty seconds later in a recorded message from the phone company, you are probably attempting to tone dial on a pulse system. Try changing the modem setup to pulse and dial again. If you are attempting to operate the modem on com ports 3 or 4, try re-installing it on 1 or 2, because some software packages will not work on 3 or 4 despite having them listed as valid options. Internal modems tend to be more troublesome than external modems, both because a generally lower quality, and the need to remove them from the machine to change hardware settings. One of the more annoying tendencies of cheaper internal modems is to pick up RF interference from other components in the system and produce a whistling tone on their piezoelectric speaker. Relocating the card to another slot sometimes lowers the noise level, re-orienting the computer can help as the sound is highly directional. Both internal and external fax/modems often suffer from a lack of fax compatibility. The documentation may help you to change settings to get the fax part of the modem working well with a particular receiving fax, only to lose compatibility with a receiving fax that worked fine the day before. In general, expect problems with sending faxes, and try sending single page faxes (no cover sheet either), if connection goes fine but page errors follow. System Startup FilesThe overwhelming majority of Hand-Me-Down PCs come with DOS installed. Unless your PC is running Windows '95 or Windows NT, the DOS system startup files are controlling how the operating system will configure the hardware. Most of the commands in the startup files, CONFIG.SYS and AUTOEXEC.BAT, are placed there automatically when software applications are installed. Software applications that modify one of these files will usually make a backup copy of the information they are changing, so that you can restore the system to it's original condition if the upgrade should fail. These backup copies always keep the same basic file name, but change the three letter extension from "BAT" or "SYS" to something like "BAK", "OLD", "SY3", or the like. If you are trying to back out a change to a system file, and the "DIR" commend shows that you have many versions of these startup files, you should use the one with the latest date to replace the current "BAT" or "SYS" file. The main reason for you to start fiddling around with your system startup files is to configure memory. One of the most deceptive errors that DOS can produce is any variation is "Not enough memory to run program." Messages like this have led many people into upgrading their memory, only to find that the problem is still present. What DOS is really trying to say is that it doesn't have enough free conventional memory to load your program. This refers to the 640K of program memory that normal DOS programs can access. As long as you have a recent version of DOS (version 5.0 or higher), you can free up program memory by moving some of the system software to other locations. If you don't have a recent enough version of DOS, you can buy an aftermarket memory manager, but upgrading to DOS 6.X makes more sense for most. DOS classifies memory five different ways. The dividing points are sometimes physical, sometimes logical. Some of the memory locations are reserved for use by the memory that's installed on the adapters, like your VGA card. The ability of DOS to address your memory in accordance with the different schemes depends on the CPU, as well as the amount of memory installed. DOS manages memory beyond the first megabyte via one or more memory managers that are installed as device drivers in your CONFIG.SYS file. Conventional Memory (or Program Memory)Conventional memory refers to the first 640KB of memory installed in the PC. DOS can manage this conventional memory without any additional memory managers being installed. The decision to make exactly 640KB of memory available for programs was somewhat arbitrary, but when DOS took shape in the early eighties, it seemed like more than anyone would ever need. The actual decimal value, 640, is derived from the hexadecimal addresses between 0000 to 9FFF. The next address, A000, marks the beginning of upper memory. Upper MemoryNot all Hand-Me-Down PCs come equipped with RAM in the upper memory area, even though DOS is capable of addressing these locations. PCs older than 386s often came with only 512KB or 640KB of memory installed. The motherboards in these older systems had no provision for adding more RAM to the memory bus, so any additions had to be made with adapters added to the I/O bus. These adapters took two forms, either expanded memory or extended memory, both of which will be discussed below. Extended memory is present in all new PCs, and simply refers to the memory beyond the first megabyte. The upper memory area comprises the 384KB (A000 to FFFF, in hex) that DOS sets aside for system use. With older versions of DOS, upper memory could only be used to map access to adapter memory, normally in 64KB pages. The EMM386.EXE memory manager was added to later versions of DOS, and requires a 386 or higher CPU to run. The main trade-off in using upper memory, is that memory set aside as upper memory by EMM386 can't be co-opted for extended memory by Windows. High MemoryHigh memory refers to the first page (64KB) of memory located immediately after the end of the first megabyte. This translates into addresses 10000 to 10FFF in hexadecimal. DOS can be loaded into high memory, freeing up both conventional and upper memory for more efficient use. You must have more than 1MB of memory installed to make use of high memory. Extended MemoryAll of the memory installed beyond the first 1MB is know as XMS (EXtended Memory). The DOS extended memory manager is the HIMEM.SYS device driver. Extended memory is used by Windows, memory aware DOS programs, and DOS utilities like SMARTDRV and RAMDRIVE. SMARTDRV uses extended memory to cache information from the drives for fast access, a similar scheme to using external cache on the motherboard to improve the performance of main memory. RAMDRIVE uses extended memory to create a simulated, super-fast drive, that is assigned the next available drive letter (D:, E:, F:) and is accessed just like a physical drive. Some high end DOS applications, like AutoCAD or 3-D games use their own extended memory managers that load with the program. Expanded MemoryAlso known as LIM (Lotus, Intel, Microsoft) specification memory, expanded memory uses a page swapping scheme to give DOS programs mapped access to additional memory. Originally, all expanded memory was added in the form of expanded memory adapters installed on the I/O bus. Those expanded memory adapters required their own memory management software which had to be added to the CONFIG.SYS file. Expanded memory adapters are no longer used in PCs, being both expensive and slow, but EMM386 can use extended memory to simulate expanded memory for use with old programs. Managing MemoryStarting with DOS 5.0, Microsoft included the LOADHIGH (LH) and DEVICEHIGH commands with the operating system. DEVICEHIGH is used in CONFIG.SYS to place device drivers in the high memory area, described above. LOADHIGH, abbreviated as LH, is place before lines in the AUTOEXEC.BAT file to load them into the high memory area. For either of these commands to be used, you must first ensure the HIMEM.SYS and EMM386.EXE are being loaded in the CONFIG.SYS file. Commands in both CONFIG.SYS and AUTOEXEC.BAT are executed sequentially, and the order they appear in determines the allocation of memory and other resources. Unless you are planning to use expanded memory, you must use the NOEMS switch. Otherwise, EMM386 will set aside a page of high memory for the swapping pages with the expanded memory, reducing the amount of space available for loading software drivers. You must also specify DOS=UMB along with DOS=HIGH in the CONFIG.SYS file, or DOS won't allow use of the upper memory area. Once the computer has booted, you can check how efficiently you are using high memory by using MEM/C |MORE. Report from MEM/C Modules using memory below 1 MB: Name Total = Conventional + Upper Memory MSDOS 44,685 (44K) 44,685 (44K) 0 (0K) HIMEM 1,120 (1K) 1,120 (1K) 0 (0K) EMM386 3,120 (3K) 3,120 (3K) 0 (0K) COMMAND 2,928 (3K) 2,928 (3K) 0 (0K) IO 80 (0K) 80 (0K) 0 (0K) ASPI2DOS 9,904 (10K) 0 (0K) 9,904 (10K) EMMDSWP 3,824 (4K) 0 (0K) 3,824 (4K) ASPICD 11,984 (12K) 0 (0K) 11,984 (12K) ONTRACK 4,064 (4K) 0 (0K) 4,064 (4K) EMMDSW 2,064 (2K) 0 (0K) 2,064 (2K) SMARTDRV 29,024 (28K) 0 (0K) 29,024 (28K) MSCDEX 57,104 (56K) 0 (0K) 57,104 (56K) MOUSE 16,848 (16K) 0 (0K) 16,848 (16K) Free 626,016 (611K) 602,224 (588K) 23,792 (23K) Memory Summary: Type of Memory Total = Used + Free Conventional 654,336 52,112 602,224 Upper 158,608 134,816 23,792 Reserved 0 0 0 Extended (XMS) 7,181,424 2,323,568 4,857,656 Total memory 7,994,368 2,510,496 5,483,872 Total under 1 MB 812,944 186,928 626,016 Largest executable program size 602,128 (588K) Largest free upper memory block 23,696 (23K) MS-DOS is resident in the high memory area. The double reporting of memory amount, in bytes and in kilobytes (K), is just the traditional way of counting memory. One kilobyte = 1024 bytes, and the amounts (in "K") shown above were reached by arbitrarily rounding up or down. The large amount of extended memory used in the example above is the being employed as cache by the SMARTDRV utility, and is released on entering Windows. When you get a "Not enough memory to load program" type error in DOS, you want to configure memory to maximize the amount of free conventional memory. You do this by maximizing the total amount of upper memory available, then using as much of the upper memory as possible. However, simply replacing every "DEVICE=" with a "DEVICEHIGH=" in your CONFIG.SYS file, and sticking a "LH" at the beginning of every AUTOEXEC.BAT command, will rarely make the best use of your resources. The DOS memory manager sequentially loads all of the software you specify into upper memory, until the remaining free memory block won't fit another program. No errors will be generated, but the remaining drivers and TSRs (Terminate and Stay Resident programs) will be loaded in conventional memory. For example, in the configuration above, it may become necessary to add a digitizer to the system, with a 34,000 byte driver. If you simply added a line to the end of the AUTOEXEC.BAT file, as in LH C:\DIGITIZE\DIGIDRV.EXE, the driver would load in conventional memory since there are only 23,696 bytes of free upper memory. That would reduce the total conventional memory available to 572,128 bytes. However, if you put the new line before the line loading the mouse driver, which is 16,848 bytes, there would then be enough upper memory for the digitizer to load. The mouse driver would then load into conventional memory, but the total amount of conventional memory free would be 585,376 bytes, a 13,000 byte gain. You can see that by juggling the programs that get loaded high you can maximize the usage. Depending on the how your upper memory is organized, the total amount of upper memory free may be appreciably greater then the largest free upper memory block. In the example above, the largest free memory block is only 96 bytes less than the total amount of upper memory, but this is rarely the case. By changing the order of the lines in your AUTOEXEC.BAT and CONFIG.SYS files, you can exercise some control over this. Just remember that the order of some drivers , like placing HIMEM before EMM386, is fixed, and your system may lock up while you experiment. Just hit the "F5" key as soon as you see the "Loading MS-DOS" message during boot to bypass both startup files, or hit "F8" to step through the startup files line by line, and see exactly where the problem is coming from. In either case, when you go to re-edit the problem file, you may get a "File not found" error, because the path to the MS-DOS editor, EDIT, is missing. Just type PATH=C:\DOS then continue as usual. Starting with MS-DOS 6.0, Microsoft added the MEMMAKER utility, which does all of the juggling for you. Another enhancement is the ability to place the drivers and TSRs in specific locations in upper memory, which really allows the programs to be shoe-horned in. Running MEMMAKER can take a bit of time and patience, since it goes through much the same process that you would be doing manually. If the system locks up during boot, MEMMAKER automatically restores the last known good boot configuration when you reboot again. MEMMAKER also offers both "aggressive" and "conservative" approaches to configuring your memory, where the aggressive setting is more likely to cause lock-ups. If you are a "hands-on" sort of person, you can still use the manual approach that was required with DOS 5.0 when configuring DOS 6.X.
|