[Home]  [Links]  [grouch] 


Troubleshooting

Don't panic. Don't reboot. Don't reinstall.

If the system is running, you're in a good position to find and fix the trouble. Remember that everything is a file and you have good file manipulation tools at hand.

If the trouble is occurring at boot, but after seeing the full 'LILO:' prompt, try starting in 'single' mode. 'LILO: linux single' should get you in to do maintenance. Each letter in the 'LILO:' prompt represents completion of a stage of the bootloader process. Check the LILO docs (/usr/doc/lilo-<version number>) to see what can stop each stage. This is when you need a rescue disk.

If it boots but complains of a bad inittab (/etc/inittab) and leaves you in a 'Read-Only filesystem!', you can use a rescue disk to edit /etc/inittab and be able to boot normally again. The inittab can be corrupted by a sudden power loss and resultant improper shutdown.

Ask yourself, when did it stop working right? Just after you edited some configuration file in /etc? You did make a backup of the file before editing, didn't you? Look, your editor may have made one for you, with the name ending in ~ or .bak or .<the date> or .save. XF86Setup creates a backup of your /etc/X11/XF86Config file before writing a new one. In my own /etc/X11 directory I have XF86Config, XF86Config.bak, XF86Config.multiresolution, XF86Config.original and XF86Config.working (which is a copy of the current XF86Config file). I count that approximately 25K of space well-spent. I can experiment with the configuration and if I don't like the results, just 'cp ./XF86Config.working ./XF86Config' and be back to a working X.

Take a walk through your /var/log files. Unless the trouble is with one of the logging daemons (not likely!) you can find some pretty detailed information in the logs. If the trouble is some runaway process (Netscape Communicator is bad about this) you can kill that process to make it give up cpu or memory resources. Use 'top' or 'ps ax' to see what's going on. Take the time _now_ to read 'man kill'. Many people are too quick with the big gun of 'kill -9', which doesn't give a program a chance to clean up after itself.

Here's a slightly prejudiced tip: dump Communicator and replace it with just Navigator. Communicator is a bloated mess from the browser feature wars and tries to do all things related to the Internet. Navigator is still a pretty solid browser and does not suffer as much from the memory leaks of the infamous Communicator. Use something like mutt or pine for your email. Use a simple text editor for html editing, or one of the WYSIWYG editors available for Linux. Communicator, in my opinion, goes against a basic principle of a Linux / Unix system: lots of little tools, each doing one task very well. Communicator tries to be one giant machine that does everything, but doesn't do any one thing very well (sound familiar?).

Learn how to start and stop services now, before you have trouble. Under Redhat-like systems, look under /etc/rc.d/init.d for scripts that control starting and stopping services such as samba, nfs, sendmail, etc. The standard is for these scripts to be under /etc/init.d and I don't know why Redhat and Mandrake do it differently. That's just part of the price of flexibility.

(I talked to one frustrated new Linux user who wanted a "user-proof" system. Such a thing would be an abomination and quickly wither from derision. "User-proof" would mean you had to accept the thing as provided, without being able to customize it. Freedom necessarily includes the freedom to totally mess up).

Keep a backup copy of your /etc directory current with your working system. It's small enough that you can have multiple copies, such as etc-2000-may.tgz, etc-2000-june.tgz and so on. You can then compare and extract only those files you suspect you mis-edited. Another slightly prejudiced tip: install mc (Midnight Commander) for use at the command line. Go through its Options - Configuration menu and check the following (use the space bar to checkmark the boxes):

Panel Options: show Backup files, show Hidden files, maRk moves down. Pause after run...: alwaYs. Other options: Verbose operation, compute Totals, shell Patterns, use internal edIt, Use internal view, coMplete: show all, rotatinG dash, cd follows linKs.

(The upper-case letters in each category of options are hot-keys, selectable by alt-<letter>, in case your arrow keys are not working and you haven't yet used the Options - Learn Keys menu). Save those options when you're finished selecting. Mc will let you examine the contents of .tar, .zip, .tar.gz, .tgz, .rpm, .deb files and extract individual files from those archives. (In order to allow you to browse through these archives, mc actually untars, unzips them to the /tmp directory. So if it is a large archive, it may take a little while before the archive contents are displayed). The two file browser panels will allow you to easily compare the archive contents to your current directory. You can then very easily copy individual files from the archive over the suspected bad file in your current /etc.

Naturally, you can do the same thing in a GUI that you can in mc. But not if the problem is with your X configuration or window manager setup. Mc is a pretty efficient way of handling lots of file manipulation tasks.

Just when you think you have a bullet-proof penguin, you'll do something to make him bust his butt on the ice. Don't sweat it, just help him up, toss him a tuna, and get on with the job.