Troubleshooting 101
Troubleshooting for a problem with a particular system is best when it can follow a previously defined checklist, flowchart, or procedure. Developing such organized approaches in advance allows sufficient thought about the steps to take and organizing the problem isolation activities into the most efficient steps. When such material does not exist, it's important to be organized.
Some of the challenges with troubleshooting are when problems are intermittent, or when multiple problems lead to the symptoms which are being observed. Possible steps include:
- Define the problem.
- Gather data/evidence.
- Identify issues that contributed to the problem.
- Find root causes.
- Develop solution recommendations.
- Implement the recommendations.
- Observe the recommended solutions to ensure effectiveness.
Ideas
- Back out the most recent change
- Keep a log of activities and circumstances under which symptoms are observed
- Look for other occurrences of the situation in other environments, to rule out local configurations
- only change one thing at a time
- Define symptoms
- Obtain a system block diagram
- Establish a list of possible causes for the symptoms (in medicine, this would be called a differential diagnosis)
- Develop a hypothesis
Efficient methodical troubleshooting starts with a clear understanding of the expected behavior of the system and the symptoms being observed. From there the troubleshooter forms hypotheses on potential causes, and devises (or perhaps references a standardized checklist) of tests to eliminate these prospective causes. Two common strategies used by troubleshooters are to check for frequently encountered or easily tested conditions first (for example, checking to ensure that a printer's light is on and that its cable is firmly seated at both ends), and to "bisect" the system (for example in a network printing system, checking to see if the job reached the server to determine whether a problem exists in the subsystems "towards" the user's end or "towards" the device).
This latter technique can be particular efficient in systems with long chains of serialized dependencies or interactions among its components. It's simply the application of a binary search across the range of dependences.
Simple and intermediate systems are characterized by lists or trees of dependencies among their components or subsystems. More complex systems contain cyclical dependencies or interactions (feedback loops). Such systems are less amenable to "bisection" troubleshooting techniques.
