17 - Testing, Debugging, and Refactoring

Notes:

Chapter 22 - Testing

22.1 Role of Developer Testing in Software Quality

Testing is an important part of any software-quality program, and in many cases it's the only part. This is unfortunate, because collaborative development practices in their various forms have been shown to find a higher percentage of errors than testing does, and they cost less than half as much per error found as testing does (Card 1987, Russell 1991, Kaplan 1995). Individual testing steps (unit test, component test, and integration test) typically find less than 50 percent of the errors present each. The combination of testing steps often finds less than 60 percent of the errors present (Jones 1998).

If you were to list a set of software-development activities on "Sesame Street" and ask, "Which of these things is not like the others?" the answer would be "Testing." Testing is a hard activity for most developers to swallow for several reasons:

Testing's goal runs counter to the goals of other development activities. The goal is to find errors. A successful test is one that breaks the software. The goal of every other development activity is to prevent errors and keep the software from breaking.
Testing can never completely prove the absence of errors. If you have tested extensively and found thousands of errors, does it mean that you've found all the errors or that you have thousands more to find? An absence of errors could mean ineffective or incomplete test cases as easily as it could mean perfect software.
Testing by itself does not improve software quality. Test results are an indicator of quality, but in and of themselves they don't improve it. Trying to improve software quality by increasing the amount of testing is like trying to lose weight by weighing yourself more often. What you eat before you step onto the scale determines how much you will weigh, and the software-development techniques you use determine how many errors testing will find. If you want to lose weight, don't buy a new scale; change your diet. If you want to improve your software, don't just test more; develop better.
Testing by itself does not improve software quality. Test results are an indicator of quality, but in and of themselves they don't improve it. Trying to improve software quality by increasing the amount of testing is like trying to lose weight by weighing yourself more often. What you eat before you step onto the scale determines how much you will weigh, and the software-development techniques you use determine how many errors testing will find. If you want to lose weight, don't buy a new scale; change your diet. If you want to improve your software, don't just test more; develop better.
Testing requires you to assume that you'll find errors in your code. If you assume you won't, you probably won't, but only because you'll have set up a self-fulfilling prophecy. If you execute the program hoping that it won't have any errors, it will be too easy to overlook the errors you find. In a study that has become a classic, Glenford Myers had a group of experienced programmers test a program with 15 known defects. The average programmer found only 5 of the 15 errors. The best found only 9. The main source of undetected errors was that erroneous output was not examined carefully enough. The errors were visible, but the programmers didn't notice them (Myers 1978).

You must hope to find errors in your code. Such a hope might seem like an unnatural act, but you should hope that it's you who finds the errors and not someone else.

A key question is, How much time should be spent in developer testing on a typical project? A commonly cited figure for all testing is 50 percent of the time spent on the project, but that's misleading. First, that particular figure combines testing and debugging; testing alone takes less time. Second, that figure represents the amount of time that's typically spent rather than the time that should be spent. Third, the figure includes independent testing as well as developer testing.

As Figure 22-1 shows, depending on the project's size and complexity, developer testing should probably take 8 to 25 percent of the total project time. This is consistent with much of the data that has been reported.

Figure 22-1 As the size of the project increases, developer testing consumes a smaller percentage of the total development time. The effects of program size are described in more detail in Chapter 27, "How Program Size Affects Construction."

A second question is, What do you do with the results of developer testing? Most immediately, you can use the results to assess the reliability of the product under development. Even if you never correct the defects that testing finds, testing describes how reliable the software is. Another use for the results is that they can and usually do guide corrections to the software. Finally, over time, the record of defects found through testing helps reveal the kinds of errors that are most common. You can use this information to select appropriate training classes, direct future technical review activities, and design future test cases.

Testing During Construction

The big, wide world of testing sometimes ignores the subject of this chapter: "white-box" or "glass-box" testing. You generally want to design a class to be a black box-a user of the class won't have to look past the interface to know what the class does. In testing the class, however, it's advantageous to treat it as a glass box, to look at the internal source code of the class as well as its inputs and outputs. If you know what's inside the box, you can test the class more thoroughly. Of course, you also have the same blind spots in testing the class that you had in writing it, and so black-box testing has advantages too.

During construction, you generally write a routine or class, check it mentally, and then review it or test it. Regardless of your integration or system-testing strategy, you should test each unit thoroughly before you combine it with any others. If you're writing several routines, you should test them one at a time. Routines aren't really any easier to test individually, but they're much easier to debug. If you throw several untested routines together at once and find an error, any of the several routines might be guilty. If you add one routine at a time to a collection of previously tested routines, you know that any new errors are the result of the new routine or of interactions with the new routine. The debugging job is easier.

Collaborative construction practices have many strengths to offer that testing can't match. But part of the problem with testing is that testing often isn't performed as well as it could be. A developer can perform hundreds of tests and still achieve only partial code coverage. A feeling of good test coverage doesn't mean that actual test coverage is adequate. An understanding of basic test concepts can support better testing and raise testing's effectiveness.

22.2 Recommended Approach to Developer Testing

A systematic approach to developer testing maximizes your ability to detect errors of all kinds with a minimum of effort. Be sure to cover this ground:

Test for each relevant requirement to make sure that the requirements have been implemented. Plan the test cases for this step at the requirements stage or as early as possible-preferably before you begin writing the unit to be tested. Consider testing for common omissions in requirements. The level of security, storage, the installation procedure, and system reliability are all fair game for testing and are often overlooked at requirements time.
Test for each relevant design concern to make sure that the design has been implemented. Plan the test cases for this step at the design stage or as early as possiblebefore you begin the detailed coding of the routine or class to be tested.
Use "basis testing" to add detailed test cases to those that test the requirements and the design. Add data-flow tests, and then add the remaining test cases needed to thoroughly exercise the code. At a minimum, you should test every line of code. Basis testing and data-flow testing are described later in this chapter.
Use a checklist of the kinds of errors you've made on the project to date or have made on previous projects.

Design the test cases along with the product. This can help avoid errors in requirements and design, which tend to be more expensive than coding errors. Plan to test and find defects as early as possible because it's cheaper to fix defects early.

Test First or Test Last?

Developers sometimes wonder whether it's better to write test cases after the code has been written or beforehand (Beck 2003). The defect-cost increase graph-see Figure 3-1 on page 30-suggests that writing test cases first will minimize the amount of time between when a defect is inserted into the code and when the defect is detected and removed. This turns out to be one of many reasons to write test cases first:

Writing test cases before writing the code doesn't take any more effort than writing test cases after the code; it simply resequences the test-case-writing activity.
When you write test cases first, you detect defects earlier and you can correct them more easily.
Writing test cases first forces you to think at least a little bit about the requirements and design before writing code, which tends to produce better code.
Writing test cases first exposes requirements problems sooner, before the code is written, because it's hard to write a test case for a poor requirement.
If you save your test cases, which you should do, you can still test last, in addition to testing first.

All in all, I think test-first programming is one of the most beneficial software practices to emerge during the past decade and is a good general approach. But it isn't a testing panacea, because it's subject to the general limitations of developer testing, which are described next.

Limitations of Developer Testing

Watch for the following limitations with developer testing:

Developer tests tend to be "clean tests"

Developers tend to test for whether the code works (clean tests) rather than test for all the ways the code breaks (dirty tests). Immature testing organizations tend to have about five clean tests for every dirty test. Mature testing organizations tend to have five dirty tests for every clean test. This ratio is not reversed by reducing the clean tests; it's done by creating 25 times as many dirty tests (Boris Beizer in Johnson 1994).

Developer testing tends to have an optimistic view of test coverage

Average programmers believe they are achieving 95 percent test coverage, but they're typically achieving more like 80 percent test coverage in the best case, 30 percent in the worst case, and more like 50-60 percent in the average case (Boris Beizer in Johnson 1994).

Developer testing tends to skip more sophisticated kinds of test coverage

Most developers view the kind of test coverage known as "100% statement coverage" as adequate. This is a good start, but it's hardly sufficient. A better coverage standard is to meet what's called " $100 %$ branch coverage," with every predicate term being tested for at least one true and one false value. Section 22.3, "Bag of Testing Tricks," provides more details about how to accomplish this.

None of these points reduce the value of developer testing, but they do help put developer testing into proper perspective. As valuable as developer testing is, it isn't sufficient to provide adequate quality assurance on its own and should be supplemented with other practices, including independent testing and collaborative construction techniques.

22.3 Bag of Testing Tricks

Why isn't it possible to prove that a program is correct by testing it? To use testing to prove that a program works, you'd have to test every conceivable input value to the program and every conceivable combination of input values. Even for simple programs, such an undertaking would become massively prohibitive. Suppose, for example, that you have a program that takes a name, an address, and a phone number and stores them in a file. This is certainly a simple program, much simpler than any whose correctness you'd really be worried about. Suppose further that each of the possible names and addresses is 20 characters long and that there are 26 possible characters to be used in them. This would be the number of possible inputs:

Name	$26^{20} (20$ characters, each with 26 possible choices $)$
Address	$26^{20} (20$ characters, each with 26 possible choices $)$
Phone Number	$10^{10} (10$ digits, each with 10 possible choices $)$
Total Possibilities	$= 26^{20} * 26^{20} * 10^{10} \approx 10^{66}$

Even with this relatively small amount of input, you have one-with-66-zeros possible test cases. To put this in perspective, if Noah had gotten off the ark and started testing this program at the rate of a trillion test cases per second, he would be far less than 1 percent of the way done today. Obviously, if you added a more realistic amount of data, the task of exhaustively testing all possibilities would become even more impossible.

Incomplete Testing

Since exhaustive testing is impossible, practically speaking, the art of testing is that of picking the test cases most likely to find errors. Of the $10^{66}$ possible test cases, only a few are likely to disclose errors that the others don't. You need to concentrate on picking a few that tell you different things rather than a set that tells you the same thing over and over.

When you're planning tests, eliminate those that don't tell you anything new-that is, tests on new data that probably won't produce an error if other, similar data didn't produce an error. Various people have proposed various methods of covering the bases efficiently, and several of these methods are discussed in the following sections.

Structure Basis Testing

In spite of the hairy name, structured basis testing is a fairly simple concept. The idea is that you need to test each statement in a program at least once. If the statement is a logical statement-an if or a while, for example-you need to vary the testing according to how complicated the expression inside the if or while is to make sure that the statement is fully tested. The easiest way to make sure that you've gotten all the bases covered is to calculate the number of paths through the program and then develop the minimum number of test cases that will exercise every path through the program.

You might have heard of "code coverage" testing or "logic coverage" testing. They are approaches in which you test all the paths through a program. Since they cover all paths, they're similar to structured basis testing, but they don't include the idea of covering all paths with a minimal set of test cases. If you use code coverage or logic coverage testing, you might create many more test cases than you would need to cover the same logic with structured basis testing.

You can compute the minimum number of cases needed for basis testing in this straightforward way:

Start with 1 for the straight path through the routine.
Add 1 for each of the following keywords, or their equivalents: if, while, repeat, for, and, and or.
Add 1 for each case in a case statement. If the case statement doesn't have a default case, add 1 more.

Here's an example: Simple Example of Computing the Number of Paths Through a Java Program

Statement1;
Statement2;
if ( x < 10 ) {
    Statement3;
}
Statement4;

In this instance, you start with one and count the if once to make a total of two. That means that you need to have at least two test cases to cover all the paths through the program. In this example, you'd need to have the following test cases:

Statements controlled by if are executed $(x < 10)$ .
Statements controlled by if aren't executed ( $x >= 10$ ).

The sample code needs to be a little more realistic to give you an accurate idea of how this kind of testing works. Realism in this case includes code containing defects.

The following listing is a slightly more complicated example. This piece of code is used throughout the chapter and contains a few possible errors.

Example of Computing the Number of Cases Needed for Basis Testing of a Java Program

// Compute Net Pay
totalWithholdings = 0;
for ( id = 0; id < numEmployees; id++ ) {
    // compute social security withholding, if below the maximum
    if ( m_employee[ id ].governmentRetirementWithheld < MAX_GOVT_RETIREMENT ) {
        governmentRetirement = ComputeGovernmentRetirement( m_employee[ id ] );
    }
    // set default to no retirement contribution
    companyRetirement = 0;
    // determine discretionary employee retirement contribution
    if ( m_employee[ id ].WantsRetirement &&
        EligibleForRetirement( m_employee[ id ] ) ) {
        companyRetirement = GetRetirement( m_employee[ id ] );
    }
    grossPay = ComputeGrossPay ( m_employee[ id ] );
    // determine IRA contribution
    personalRetirement = 0;
    if ( EligibleForPersonalRetirement( m_employee[ id ] ) ) {
        personalRetirement = PersonalRetirementContribution( m_employee[ id ],
            companyRetirement, grossPay );
    }
    // make weekly paycheck
    withholding = ComputeWithholding( m_employee[ id ] );
    netPay = grossPay - withholding - companyRetirement - governmentRetirement -
        personalRetirement;
    PayEmployee( m_employee[ id ], netPay );
	
    // add this employee's paycheck to total for accounting
    totalWithholdings = totalWithholdings + withholding;
    totalGovernmentRetirement = totalGovernmentRetirement + governmentRetirement;
    totalRetirement = totalRetirement + companyRetirement;
}

SavePayRecords( totalWithholdings, totalGovernmentRetirement, totalRetirement );

In this example, you'll need one initial test case plus one for each of the five keywords, for a total of six. That doesn't mean that any six test cases will cover all the bases. It means that, at a minimum, six cases are required. Unless the cases are constructed carefully, they almost surely won't cover all the bases. The trick is to pay attention to the same keywords you used when counting the number of cases needed. Each keyword in the code represents something that can be either true or false; make sure you have at least one test case for each true and at least one for each false.

Here is a set of test cases that covers all the bases in this example:

Case	Test Description	Test Data
1	Nominal case	All boolean conditions are true
2	The initial for condition is false	numEmployees < 1
3	The first if is false	m_employee[ id ].governmentRetirementWithheld >=MAX_GOVT_RETIREMENT
4	The second if is false because the first part of the and is false	not m_employee[ id ].WantsRetirement
5	The second if is false because the second part of the and is false	not EligibleForRetirement( m_employee[id] )
6	The third if is false	not EligibleForPersonalRetirement( m_employee[ id ])

Note: This table will be extended with additional test cases throughout the chapter.

If the routine were much more complicated than this, the number of test cases you'd have to use just to cover all the paths would increase pretty quickly. Shorter routines tend to have fewer paths to test. Boolean expressions without a lot of ands and ors have fewer variations to test. Ease of testing is another good reason to keep your routines short and your boolean expressions simple.

Now that you've created six test cases for the routine and satisfied the demands of structured basis testing, can you consider the routine to be fully tested? Probably not.

This kind of testing assures you only that all of the code will be executed. It does not account for variations in data.

Data-Flow Testing

Considering the last section and this one together gives you another example illustrating that control flow and data flow are equally important in computer programming.

Data-flow testing is based on the idea that data usage is at least as error-prone as control flow. Boris Beizer claims that at least half of all code consists of data declarations and initializations (Beizer 1990).

Data can exist in one of three states:

Defined The data has been initialized, but it hasn't been used yet.
Used The data has been used for computation, as an argument to a routine, or for something else.
Killed The data was once defined, but it has been undefined in some way. For example, if the data is a pointer, perhaps the pointer has been freed. If it's a forloop index, perhaps the program is out of the loop and the programming language doesn't define the value of a for-loop index once it's outside the loop. If it's a pointer to a record in a file, maybe the file has been closed and the record pointer is no longer valid.

In addition to having the terms "defined," "used," and "killed," it's convenient to have terms that describe entering or exiting a routine immediately before or after doing something to a variable:

Entered The control flow enters the routine immediately before the variable is acted upon. A working variable is initialized at the top of a routine, for example.
Exited The control flow leaves the routine immediately after the variable is acted upon. A return value is assigned to a status variable at the end of a routine, for example.

Combinations of Data States

The normal combination of data states is that a variable is defined, used one or more times, and perhaps killed. View the following patterns suspiciously:

Defined-Defined If you have to define a variable twice before the value sticks, you don't need a better program, you need a better computer! It's wasteful and error-prone, even if not actually wrong.
Defined-Exited If the variable is a local variable, it doesn't make sense to define it and exit without using it. If it's a routine parameter or a global variable, it might be all right.
Defined-Killed Defining a variable and then killing it suggests that either the variable is extraneous or the code that was supposed to use the variable is missing.
Entered-Killed This is a problem if the variable is a local variable. It wouldn't need to be killed if it hasn't been defined or used. If, on the other hand, it's a routine parameter or a global variable, this pattern is all right as long as the variable is defined somewhere else before it's killed.
Entered-Used Again, this is a problem if the variable is a local variable. The variable needs to be defined before it's used. If, on the other hand, it's a routine parameter or a global variable, the pattern is all right if the variable is defined somewhere else before it's used.
Killed-Killed A variable shouldn't need to be killed twice. Variables don't come back to life. A resurrected variable indicates sloppy programming. Double kills are also fatal for pointers-one of the best ways to hang your machine is to kill (free) a pointer twice.
Killed-Used Using a variable after it has been killed is a logical error. If the code seems to work anyway (for example, a pointer that still points to memory that's been freed), that's an accident, and Murphy's Law says that the code will stop working at the time when it will cause the most mayhem.
Used-Defined Using and then defining a variable might or might not be a problem, depending on whether the variable was also defined before it was used. Certainly if you see a used-defined pattern, it's worthwhile to check for a previous definition.

Check for these anomalous sequences of data states before testing begins. After you've checked for the anomalous sequences, the key to writing data-flow test cases is to exercise all possible defined-used paths. You can do this to various degrees of thoroughness, including

All definitions. Test every definition of every variable-that is, every place at which any variable receives a value. This is a weak strategy because if you try to exercise every line of code, you'll do this by default.
All defined-used combinations. Test every combination of defining a variable in one place and using it in another. This is a stronger strategy than testing all definitions because merely executing every line of code does not guarantee that every defined-used combination will be tested.

Java Example of a Program Whose Data Flow Is to Be Tested

if ( Condition 1 ) {
    x = a;
}
else {
    x = b;
}
if ( Condition 2 ) {
    y = x + 1;
}
else {
    y = x - 1;
}

To cover every path in the program, you need one test case in which Condition 1 is true and one in which it's false. You also need a test case in which Condition 2 is true and one in which it's false. This can be handled by two test cases: Case 1 (Condition 1=True, Condition 2=True) and Case 2 (Condition 1=False, Condition 2=False). Those two cases are all you need for structured basis testing. They're also all you need to exercise every line of code that defines a variable; they give you the weak form of data-flow testing automatically.

To cover every defined-used combination, however, you need to add a few more cases. Right now you have the cases created by having Condition 1 and Condition 2 true at the same time and Condition 1 and Condition 2 false at the same time:

x = a
...
y = x + 1
and
x = b
...
y = x - 1

But you need two more cases to test every defined-used combination: (1) $x = a$ and then $y = x - 1$ and (2) $x = b$ and then $y = x + 1$ . In this example, you can get these combinations by adding two more cases: Case 3 (Condition 1=True,Condition 2=False) and Case 4 (Condition 1=False, Condition 2=True).

A good way to develop test cases is to start with structured basis testing, which gives you some if not all of the defined-used data flows. Then add the cases you still need to have a complete set of defined-used data-flow test cases.

As discussed in the previous section, structured basis testing provided six test cases for the routine beginning on page 507. Data-flow testing of each defined-used pair requires several more test cases, some of which are covered by existing test cases and some of which aren't. Here are all the data-flow combinations that add test cases beyond the ones generated by structured basis testing:

Case	Test Description
7	Define companyRetirement in line 12, and use it first in line 26.
	This isn't necessarily covered by any of the previous test cases.
8	Define companyRetirement in line 12, and use it first in line 31.
	This isn't necessarily covered by any of the previous test cases.
9	Define companyRetirement in line 17, and use it first in line 31.
	This isn't necessarily covered by any of the previous test cases.
Once you run through the process of listing data-flow test cases a few times, you'll get a sense of which cases are fruitful and which are already covered. When you get stuck, list all the defined-used combinations. That might seem like a lot of work, but it's guaranteed to show you any cases that you didn't test for free in the basis-testing approach.

Equivalence Partitioning

A good test case covers a large part of the possible input data. If two test cases flush out exactly the same errors, you need only one of them. The concept of "equivalence partitioning" is a formalization of this idea and helps reduce the number of test cases required.

The condition to be tested is m_employee[ ID ].governmentRetirementWithheld < MAX_GOVT_RETIREMENT. This case has two equivalence classes: the class in which m_employee[ ID ].governmentRetirementWithheld is less than MAX_GOVT_RETIREMENT and the class in which it's greater than or equal to MAX_GOVT_RETIREMENT. Other parts of the program might have other related equivalence classes that imply that you need to test more than two possible values of m_employee[ ID ].governmentRetirementWithheld, but as far as this part of the program is concerned, only two are needed.

Thinking about equivalence partitioning won't give you a lot of new insight into a program when you have already covered the program with basis and data-flow testing. It's especially helpful, however, when you're looking at a program from the outside (from a specification rather than the source code) or when the data is complicated and the complications aren't all reflected in the program's logic.

Error Guessing

In addition to the formal test techniques, good programmers use a variety of less formal, heuristic techniques to expose errors in their code. One heuristic is the technique of error guessing. The term "error guessing" is a lowbrow name for a sensible concept. It means creating test cases based upon guesses about where the program might have errors, although it implies a certain amount of sophistication in the guessing.

You can base guesses on intuition or on past experience. Chapter 21, "Collaborative Construction," points out that one virtue of inspections is that they produce and maintain a list of common errors. The list is used to check new code. When you keep records of the kinds of errors you've made before, you improve the likelihood that your "error guess" will discover an error.

The next few sections describe specific kinds of errors that lend themselves to error guessing.

Boundary Analysis

One of the most fruitful areas for testing is boundary conditions-off-by-one errors. Saying num -1 when you mean num and saying >= when you mean > are common mistakes.

The idea of boundary analysis is to write test cases that exercise the boundary conditions. Pictorially, if you're testing for a range of values that are less than max, you have three possible conditions:

As shown, there are three boundary cases: just less than max, max itself, and just greater than max. It takes three cases to ensure that none of the common mistakes has been made.

The code sample on page 507 contains a check for m_employee[ ID ].governmentRetirementWithheld < MAX_GOVT_RETIREMENT. According to the principles of boundary analysis, three cases should be examined:

Case	Test Description
1	Case 1 is defined so that the true condition for m_employee[ ID ].governmentRetirementWithheld < MAX_GOVT_RETIREMENT is the first case on the true side of the boundary. Thus, the Case 1 test case sets m_employee[ ID ].governmentRetirementWithheld to MAX_GOVT_RETIREMENT - 1. This test case was already generated.
3	Case 3 is defined so that the false condition for m_employee[ ID ]. governmentRetirementWithheld < MAX_GOVT_RETIREMENT is on the false side of the boundary. Thus, the Case 3 test case sets m_employee[ ID ].governmentRetirementWithheld to MAX_GOVT_RETIREMENT +1 . This test case was also already generated.
10	An additional test case is added for the case directly on the boundary in which m_employee [ ID ].governmentRetirementWithheld = MAX_GOVT_RETIREMENT.

Compound Boundaries

Boundary analysis also applies to minimum and maximum allowable values. In this example, it might be minimum or maximum grossPay, companyRetirement, or PersonalRetirementContribution, but because calculations of those values are outside the scope of the routine, test cases for them aren't discussed further here.

A more subtle kind of boundary condition occurs when the boundary involves a combination of variables. For example, if two variables are multiplied together, what happens when both are large positive numbers? Large negative numbers? 0 ? What if all the strings passed to a routine are uncommonly long?

In the running example, you might want to see what happens to the variables totalWithholdings, totalGovernmentRetirement, and totalRetirement when every member of a large group of employees has a large salary-say, a group of programmers at $$ 250, 000$ each. (We can always hope!) This calls for another test case:

Case	Test Description
11	A large group of employees, each of whom has a large salary (what constitutes "large" depends on the specific system being developed)-for the sake of example, we'll say 1000 employees, each with a salary of $250,000, none of whom have had any social security tax withheld and all of whom want retirement withholding.

A test case in the same vein but on the opposite side of the looking glass would be a small group of employees, each of whom has a salary of $$ 0.00$ :

Case	Test Description	I
12	A group of 10 employees, each of whom has a salary of $$ 0.00$ .

Classes of Bad Data

Aside from guessing that errors show up around boundary conditions, you can guess about and test for several other classes of bad data. Typical bad-data test cases include

Too little data (or no data)
Too much data
The wrong kind of data (invalid data)
The wrong size of data
Uninitialized data

Some of the test cases you would think of if you followed these suggestions have already been covered. For example, "too little data" is covered by Cases 2 and 12, and it's hard to come up with anything for "wrong size of data." Classes of bad data nonetheless gives rise to a few more cases:

Case	Test Description
13	An array of 100,000,000 employees. Tests for too much data. Of course, how much is too much would vary from system to system, but for the sake of the example, assume that this is far too much.
14	A negative salary. Wrong kind of data.
15	A negative number of employees. Wrong kind of data.

Classes of Good Data

When you try to find errors in a program, it's easy to overlook the fact that the nominal case might contain an error. Usually the nominal cases described in the basis-testing section represent one kind of good data. Following are other kinds of good data that are worth checking. Checking each of these kinds of data can reveal errors, depending on the item being tested.

Nominal cases-middle-of-the-road, expected values
Minimum normal configuration
Maximum normal configuration
Compatibility with old data

The minimum normal configuration is useful for testing not just one item, but a group of items. It's similar in spirit to the boundary condition of many minimal values, but it's different in that it creates the set of minimum values out of the set of what is normally expected. One example would be to save an empty spreadsheet when testing a spreadsheet. For testing a word processor, it would be saving an empty document. In the case of the running example, testing the minimum normal configuration would add the following test case:

Case	Test Description
16	A group of one employee. To test the minimum normal configuration.
The maximum normal configuration is the opposite of the minimum. It's similar in spirit to boundary testing, but again, it creates a set of maximum values out of the set of expected values. An example of this would be saving a spreadsheet that's as large as the "maximum spreadsheet size" advertised on the product's packaging. Or printing the maximum-size spreadsheet. For a word processor, it would be saving a document of the largest recommended size. In the case of the running example, testing the maximum normal configuration depends on the maximum normal number of employees. Assuming it's 500 , you would add the following test case:

Case

Test Description

A group of one employee. To test the minimum normal configuration.

The maximum normal configuration is the opposite of the minimum. It's similar in spirit to boundary testing, but again, it creates a set of maximum values out of the set of expected values. An example of this would be saving a spreadsheet that's as large as the "maximum spreadsheet size" advertised on the product's packaging. Or printing the maximum-size spreadsheet. For a word processor, it would be saving a document of the largest recommended size. In the case of the running example, testing the maximum normal configuration depends on the maximum normal number of employees. Assuming it's 500 , you would add the following test case:

Case	Test Description
17	A group of 500 employees. To test the maximum normal configuration.
The last kind of normal data testing-testing for compatibility with old data-comes into play when the program or routine is a replacement for an older program or routine. The new routine should produce the same results with old data that the old routine did, except in cases in which the old routine was defective. This kind of continuity between versions is the basis for regression testing, the purpose of which is to ensure that corrections and enhancements maintain previous levels of quality without backsliding. In the case of the running example, the compatibility criterion wouldn't add any test cases.

Case

Test Description

A group of 500 employees. To test the maximum normal configuration.

The last kind of normal data testing-testing for compatibility with old data-comes into play when the program or routine is a replacement for an older program or routine. The new routine should produce the same results with old data that the old routine did, except in cases in which the old routine was defective. This kind of continuity between versions is the basis for regression testing, the purpose of which is to ensure that corrections and enhancements maintain previous levels of quality without backsliding. In the case of the running example, the compatibility criterion wouldn't add any test cases.

Use Test Cases That Make Hand-Checks Convenient

Let's suppose you're writing a test case for a nominal salary; you need a nominal salary, and the way you get one is to type in whatever numbers your hands land on. I'll try it:

1239078382346

OK. That's a pretty high salary, a little over a trillion dollars, in fact, but if I trim it so that it's somewhat realistic, I get $90,783.82.

Now, further suppose that the test case succeeds-that is, it finds an error. How do you know that it's found an error? Well, presumably, you know what the answer is and what it should be because you calculated the correct answer by hand. When you try to do hand-calculations with an ugly number like $$ 90, 783.82$ , however, you're as likely to make an error in the hand-calc as you are to discover one in your program. On the other hand, a nice, even number like $$ 20, 000$ makes number crunching a snap. The 0s are easy to punch into the calculator, and multiplying by 2 is something most programmers can do without using their fingers and toes.

You might think that an ugly number like $$ 90, 783.82$ would be more likely to reveal errors, but it's no more likely to than any other number in its equivalence class.

22.6 Improving Your Testing

The steps for improving your testing are similar to the steps for improving any other process. You have to know exactly what the process does so that you can vary it slightly and observe the effects of the variation. When you observe a change that has a positive effect, you modify the process so that it becomes a little better. The following sections describe how to do this with testing.

Planning to Test

One key to effective testing is planning from the beginning of the project to test. Putting testing on the same level of importance as design or coding means that time will be allocated to it, it will be viewed as important, and it will be a high-quality process. Test planning is also an element of making the testing process repeatable. If you can't repeat it, you can't improve it.

Retesting (Regression Testing)

Suppose that you've tested a product thoroughly and found no errors. Suppose that the product is then changed in one area and you want to be sure that it still passes all the tests it did before the change-that the change didn't introduce any new defects. Testing designed to make sure the software hasn't taken a step backward, or "regressed," is called "regression testing."

It's nearly impossible to produce a high-quality software product unless you can systematically retest it after changes have been made. If you run different tests after each change, you have no way of knowing for sure that no new defects have been introduced. Consequently, regression testing must run the same tests each time. Sometimes new tests are added as the product matures, but the old tests are kept too.

Automated Testing

The only practical way to manage regression testing is to automate it. People become numbed from running the same tests many times and seeing the same test results many times. It becomes too easy to overlook errors, which defeats the purpose of regression testing. Test guru Boriz Beizer reports that the error rate in manual testing is comparable to the bug rate in the code being tested. He estimates that in manual testing, only about half of all the tests are executed properly (Johnson 1994).

Benefits of test automation include the following:

An automated test has a lower chance of being wrong than a manual test.
Once you automate a test, it's readily available for the rest of the project with litthe incremental effort on your part.
If tests are automated, they can be run frequently to see whether any code checkins have broken the code. Test automation is part of the foundation of test-intensive practices, such as the daily build and smoke test and Extreme Programming.
Automated tests improve your chances of detecting any given problem at the earliest possible moment, which tends to minimize the work needed to diagnose and correct the problem.
Automated tests provide a safety net for large-scale code changes because they increase your chance of quickly detecting defects inserted during the modifications.
Automated tests are especially useful in new, volatile technology environments because they flush out changes in the environments sooner rather than later.

The main tools used to support automated testing provide test scaffolding, generate input, capture output, and compare actual output with expected output. The variety of tools discussed in the preceding section will perform some or all of these functions.

Chapter 23 - Debugging

Debugging is the process of identifying the root cause of an error and correcting it. It contrasts with testing, which is the process of detecting the error initially. On some projects, debugging occupies as much as 50 percent of the total development time. For many programmers, debugging is the hardest part of programming.

Debugging doesn't have to be the hardest part. If you follow the advice in this book, you'll have fewer errors to debug. Most of the defects you'll have will be minor oversights and typos, easily found by looking at a source-code listing or stepping through the code in a debugger. For the remaining harder bugs, this chapter describes how to make debugging much easier than it usually is.

23.1 Overview of Debugging Issues

The late Rear Admiral Grace Hopper, co-inventor of COBOL, always said that the word "bug" in software dated back to the first large-scale digital computer, the Mark I (IEEE 1992). Programmers traced a circuit malfunction to the presence of a large moth that had found its way into the computer, and from that time on, computer problems were blamed on "bugs." Outside software, the word "bug" dates back at least to Thomas Edison, who is quoted as using it as early as 1878 (Tenner 1997).

The word "bug" is a cute word.

The reality of software defects, however, is that bugs aren't organisms that sneak into your code when you forget to spray it with pesticide. They are errors. A bug in software means that a programmer made a mistake. The result of the mistake isn't like the cute picture shown above.

In the context of this book, technical accuracy requires that mistakes in the code be called "errors," "defects," or "faults."

Role of Debugging in Software Quality

Like testing, debugging isn't a way to improve the quality of your software per se; it's a way to diagnose defects. Software quality must be built in from the start. The best way to build a quality product is to develop requirements carefully, design well, and use high-quality coding practices. Debugging is a last resort.

Variations in Debugging Performance

Why talk about debugging? Doesn't everyone know how to debug?
No, not everyone knows how to debug. Studies of experienced programmers have found roughly a 20-to-1 difference in the time it takes experienced programmers to find the same set of defects found by by inexperienced programmers. Moreover, some programmers find more defects and make corrections more accurately. Here are the results of a classic study that examined how effectively professional programmers with at least four years of experience debugged a program with 12 defects:

	Fastest Three Programmers	Slowest Three Programmers
Average debug time (minutes)	5.0	14.1
Average number of defects not found	0.7	1.7
Average number of defects made correcting defects	3.0	7.7

Source: "Some Psychological Evidence on How People Debug Computer Programs" (Gould 1975)

The three programmers who were best at debugging were able to find the defects in about one-third the time and inserted only about two-fifths as many new defects as the three who were the worst. The best programmer found all the defects and didn't insert any new defects in correcting them. The worst missed 4 of the 12 defects and inserted 11 new defects in correcting the 8 defects he found.

But this study doesn't really tell the whole story. After the first round of debugging, the fastest three programmers still have 3.7 defects left in their code and the slowest still have 9.4 defects. Neither group is done debugging yet. I wondered what would happen if I applied the same find-and-bad-fix ratios to additional debugging cycles. My results aren't statistically valid, but they're still interesting. When I applied the same find-and-bad-fix ratios to successive debugging cycles until each group had less than half a defect remaining, the fastest group required a total of three debugging cycles, whereas the slowest group required 14 debugging cycles. Bearing in mind that each cycle of the slower group takes almost three times as long as each cycle of the fastest group, the slowest group would take about 13 times as long to fully debug its programs as the fastest group, according to my nonscientific extrapolation of this study. This wide variation has been confirmed by other studies (Gilb 1977, Curtis 1981).

In addition to providing insight into debugging, the evidence supports the General Principle of Software Quality: improving quality reduces development costs. The best programmers found the most defects, found the defects most quickly, and made correct modifications most often. You don't have to choose between quality, cost, and time-they all go hand in hand.

Defects as Opportunities

What does having a defect mean? Assuming that you don't want the program to have a defect, it means that you don't fully understand what the program does. The idea of not understanding what the program does is unsettling. After all, if you created the program, it should do your bidding. If you don't know exactly what you're telling the computer to do, you're only a small step away from merely trying different things until something seems to work-that is, programming by trial and error. And if you're programming by trial and error, defects are guaranteed. You don't need to learn how to fix defects; you need to learn how to avoid them in the first place.

Most people are somewhat fallible, however, and you might be an excellent programmer who has simply made a modest oversight. If this is the case, an error in your program provides a powerful opportunity for you to learn many things. You can:

Learn about the program you're working on

You have something to learn about the program because if you already knew it perfectly, it wouldn't have a defect. You would have corrected it already.

Learn about the kinds of mistakes you make

If you wrote the program, you inserted the defect. It's not every day that a spotlight exposes a weakness with glaring clarity, but such a day is an opportunity, so take advantage of it. Once you find the mistake, ask yourself how and why you made it. How could you have found it more quickly? How could you have prevented it? Does the code have other mistakes just like it? Can you correct them before they cause problems of their own?

Learn about the quality of your code from the point of view of someone who has to read it

You'll have to read your code to find the defect. This is an opportunity to look critically at the quality of your code. Is it easy to read? How could it be better? Use your discoveries to refactor your current code or to improve the code you write next.

Learn about how you solve problems

Does your approach to solving debugging problems give you confidence? Does your approach work? Do you find defects quickly? Or is your approach to debugging weak? Do you feel anguish and frustration? Do you guess randomly? Do you need to improve? Considering the amount of time many projects spend on debugging, you definitely won't waste time if you observe how you debug. Taking time to analyze and change the way you debug might be the quickest way to decrease the total amount of time it takes you to develop a program.

Learn about how you fix defects

In addition to learning how you find defects, you can learn about how you fix them. Do you make the easiest possible correction by applying goto bandages and special-case makeup that changes the symptom but not the problem? Or do you make systemic corrections, demanding an accurate diagnosis and prescribing treatment for the heart of the problem?

All things considered, debugging is an extraordinarily rich soil in which to plant the seeds of your own improvement. It's where all construction roads cross: readability, design, code quality-you name it. This is where building good code pays off, especially if you do it well enough that you don't have to debug very often.

An Ineffective Approach

Unfortunately, programming classes in colleges and universities hardly ever offer instruction in debugging. If you studied programming in college, you might have had a lecture devoted to debugging. Although my computer-science education was excellent, the extent of the debugging advice I received was to "put print statements in the program to find the defect." This is not adequate. If other programmers' educational experiences are like mine, a great many programmers are being forced to reinvent debugging concepts on their own. What a waste!

The Devil's Guide to Debugging

In Dante's vision of hell, the lowest circle is reserved for Satan himself. In modern times, Old Scratch has agreed to share the lowest circle with programmers who don't learn to debug effectively. He tortures programmers by making them use these common debugging approaches:

Find the defect by guessing

To find the defect, scatter print statements randomly throughout a program. Examine the output to see where the defect is. If you can't find the defect with print statements, try changing things in the program until something seems to work. Don't back up the original version of the program, and don't keep a record of the changes you've made. Programming is more exciting when you're not quite sure what the program is doing. Stock up on cola and candy because you're in for a long night in front of the terminal.

Don't waste time trying to understand the problem

It's likely that the problem is trivial, and you don't need to understand it completely to fix it. Simply finding it is enough.

Fix the error with the most obvious fix

It's usually good just to fix the specific problem you see, rather than wasting a lot of time making some big, ambitious correction that's going to affect the whole program. This is a perfect example:

x = Compute( y )
if ( y = 17 )
    x = $25.15 -- Compute() doesn't work for y = 17, so fix it

Who needs to dig all the way into Compute() for an obscure problem with the value of 17 when you can just write a special case for it in the obvious place?

Debugging by Superstition

Satan has leased part of hell to programmers who debug by superstition. Every group has one programmer who has endless problems with demon machines, mysterious compiler defects, hidden language defects that appear when the moon is full, bad data, losing important changes, a possessed editor that saves programs incorrectlyyou name it. This is "programming by superstition."

If you have a problem with a program you've written, it's your fault. It's not the computer's fault, and it's not the compiler's fault. The program doesn't do something different every time. It didn't write itself; you wrote it, so take responsibility for it.

Even if an error at first appears not to be your fault, it's strongly in your interest to assume that it is. That assumption helps you debug. It's hard enough to find a defect in your code when you're looking for it; it's even harder when you assume your code is error-free. Assuming the error is your fault also improves your credibility. If you claim that an error arose from someone else's code, other programmers will believe that you have checked out the problem carefully. If you assume the error is yours, you avoid the embarrassment of having to recant publicly later when you find out that it was your defect after all.

23.2 Finding a Defect

Debugging consists of finding the defect and fixing it. Finding the defect-and understanding it-is usually 90 percent of the work.

Fortunately, you don't have to make a pact with Satan to find an approach to debugging that's better than random guessing. Debugging by thinking about the problem is much more effective and interesting than debugging with an eye of a newt and the dust of a frog's ear.

Suppose you were asked to solve a murder mystery. Which would be more interesting: going door to door throughout the county, checking every person's alibi for the night of October 17, or finding a few clues and deducing the murderer's identity? Most people would rather deduce the person's identity, and most programmers find the intellectual approach to debugging more satisfying. Even better, the effective programmers who debug in one-twentieth the time used by the ineffective programmers aren't randomly guessing about how to fix the program. They're using the scientific method-that is, the process of discovery and demonstration necessary for scientific investigation.

The Scientific Method of Debugging

Here are the steps you go through when you use the classic scientific method:

Gather data through repeatable experiments.
Form a hypothesis that accounts for the relevant data.
Design an experiment to prove or disprove the hypothesis.
Prove or disprove the hypothesis.
Repeat as needed.

The scientific method has many parallels in debugging. Here's an effective approach for finding a defect:

Stabilize the error.
Locate the source of the error (the "fault").

a. Gather the data that produces the defect.
b. Analyze the data that has been gathered, and form a hypothesis about the defect.
c. Determine how to prove or disprove the hypothesis, either by testing the program or by examining the code.
d. Prove or disprove the hypothesis by using the procedure identified in 2(c).
Fix the defect.
Test the fix.
Look for similar errors.

The first step is similar to the scientific method's first step in that it relies on repeatability. The defect is easier to diagnose if you can stabilize it-that is, make it occur reliably. The second step uses the steps of the scientific method. You gather the test data that divulged the defect, analyze the data that has been produced, and form a hypothesis about the source of the error. You then design a test case or an inspection to evaluate the hypothesis, and you either declare success (regarding proving your hypothesis) or renew your efforts, as appropriate. When you have proven your hypothesis, you fix the defect, test the fix, and search your code for similar errors.

Let's look at each of the steps in conjunction with an example. Assume that you have an employee database program that has an intermittent error. The program is supposed to print a list of employees and their income-tax withholdings in alphabetical order. Here's part of the output:

Formatting, Fred Freeform	$$ 5, 877$
Global, Gary	$$ 1, 666$
Modula, Mildred	$$ 10, 788$
Many-Loop, Mavis	$$ 8, 889$
Statement, Sue Switch	$$ 4, 000$
Whileloop, Wendy	$$ 7, 860$
The error is that Many-Loop, Mavis and Modula, Mildred are out of order.

Stabilize the Error

If a defect doesn't occur reliably, it's almost impossible to diagnose. Making an intermittent defect occur predictably is one of the most challenging tasks in debugging.

An error that doesn't occur predictably is usually an initialization error, a timing issue, or a dangling-pointer problem. If the calculation of a sum is right sometimes and wrong sometimes, a variable involved in the calculation probably isn't being initialized properly-most of the time it just happens to start at 0 . If the problem is a strange and unpredictable phenomenon and you're using pointers, you almost certainly have an uninitialized pointer or are using a pointer after the memory that it points to has been deallocated.

Stabilizing an error usually requires more than finding a test case that produces the error. It includes narrowing the test case to the simplest one that still produces the error. The goal of simplifying the test case is to make it so simple that changing any aspect of it changes the behavior of the error. Then, by changing the test case carefully and watching the program's behavior under controlled conditions, you can diagnose the problem. If you work in an organization that has an independent test team, sometimes it's the team's job to make the test cases simple. Most of the time, it's your job.

To simplify the test case, you bring the scientific method into play again. Suppose you have 10 factors that, used in combination, produce the error. Form a hypothesis about which factors were irrelevant to producing the error. Change the supposedly irrelevant factors, and rerun the test case. If you still get the error, you can eliminate those factors and you've simplified the test. Then you can try to simplify the test further. If you don't get the error, you've disproved that specific hypothesis and you know more than you did before. It might be that some subtly different change would still produce the error, but you know at least one specific change that does not.

In the employee withholdings example, when the program is run initially, Many-Loop, Mavis is listed after Modula, Mildred. When the program is run a second time, however, the list is fine:

Formatting, Fred Freeform	$$ 5, 877$
Global, Gary	$$ 1, 666$
Many-Loop, Mavis	$$ 8, 889$
Modula, Mildred	$$ 10, 788$
Statement, Sue Switch	$$ 4, 000$
Whileloop, Wendy	$$ 7, 860$
It isn't until Fruit-Loop, Frita is entered and shows up in an incorrect position that you remember that Modula, Mildred had been entered just prior to showing up in the wrong spot too. What's odd about both cases is that they were entered singly. Usually, employees are entered in groups.

You hypothesize: the problem has something to do with entering a single new employee. If this is true, running the program again should put Fruit-Loop, Frita in the right position. Here's the result of a second run:

Formatting, Fred Freeform	$$ 5, 877$
Fruit-Loop, Frita	$$ 5, 771$
Global, Gary	$$ 1, 666$
Many-Loop, Mavis	$$ 8, 889$
Modula, Mildred	$$ 10, 788$
Statement, Sue Switch	$$ 4, 000$
Whileloop, Wendy	$$ 7, 860$
This successful run supports the hypothesis. To confirm it, you want to try adding a few new employees, one at a time, to see whether they show up in the wrong order and whether the order changes on the second run.

Locate the Source of the Error

Locating the source of the error also calls for using the scientific method. You might suspect that the defect is a result of a specific problem, say an off-by-one error. You could then vary the parameter you suspect is causing the problem-one below the boundary, on the boundary, and one above the boundary-and determine whether your hypothesis is correct.

In the running example, the source of the problem could be an off-by-one defect that occurs when you add one new employee but not when you add two or more. Examining the code, you don't find an obvious off-by-one defect. Resorting to Plan B, you run a test case with a single new employee to see whether that's the problem. You add Hardcase, Henry as a single employee and hypothesize that his record will be out of order. Here's what you find:

Formatting, Fred Freeform	$$ 5, 877$
Fruit-Loop, Frita	$$ 5, 771$
Global, Gary	$$ 1, 666$
Hardcase, Henry	$$ 493$
Many-Loop, Mavis	$$ 8, 889$
Modula, Mildred	$$ 10, 788$
Statement, Sue Switch	$$ 4, 000$
Whileloop, Wendy	$$ 7, 860$

The line for Hardcase, Henry is exactly where it should be, which means that your first hypothesis is false. The problem isn't caused simply by adding one employee at a time. It's either a more complicated problem or something completely different.

Examining the test-run output again, you notice that Fruit-Loop, Frita and Many-Loop, Mavis are the only names containing hyphens. Fruit-Loop was out of order when she was first entered, but Many-Loop wasn't, was she? Although you don't have a printout from the original entry, in the original error Modula, Mildred appeared to be out of order, but she was next to Many-Loop. Maybe Many-Loop was out of order and Modula was all right.

You hypothesize again: the problem arises from names with hyphens, not names that are entered singly.

But how does that account for the fact that the problem shows up only the first time an employee is entered? You look at the code and find that two different sorting routines are used. One is used when an employee is entered, and another is used when the data is saved. A closer look at the routine used when an employee is first entered shows that it isn't supposed to sort the data completely. It only puts the data in approximate order to speed up the save routine's sorting. Thus, the problem is that the data is printed before it's sorted. The problem with hyphenated names arises because the rough-sort routine doesn't handle niceties such as punctuation characters. Now, you can refine the hypothesis even further.

You hypothesize one last time: names with punctuation characters aren't sorted correctly until they're saved.

You later confirm this hypothesis with additional test cases.

Tips for Finding Defects

Once you've stabilized an error and refined the test case that produces it, finding its source can be either trivial or challenging, depending on how well you've written your code. If you're having a hard time finding a defect, it could be because the code isn't well written. You might not want to hear that, but it's true. If you're having trouble, consider these tips:

Use all the data available to make your hypothesis

When creating a hypothesis about the source of a defect, account for as much of the data as you can in your hypothesis. In the example, you might have noticed that Fruit-Loop, Frita was out of order and created a hypothesis that names beginning with an " F " are sorted incorrectly. That's a poor hypothesis because it doesn't account for the fact that Modula, Mildred was out of order or that names are sorted correctly the second time around. If the data doesn't fit the hypothesis, don't discard the data-ask why it doesn't fit, and create a new hypothesis.

The second hypothesis in the example-that the problem arises from names with hyphens, not names that are entered singly-didn't seem initially to account for the fact that names were sorted correctly the second time around either. In this case, however, the second hypothesis led to a more refined hypothesis that proved to be correct. It's all right that the hypothesis doesn't account for all of the data at first as long as you keep refining the hypothesis so that it does eventually.

Refine the test cases that produce the error

If you can't find the source of an error, try to refine the test cases further than you already have. You might be able to vary one parameter more than you had assumed, and focusing on one of the parameters might provide the crucial breakthrough.

Exercise the code in your unit test suite

Defects tend to be easier to find in small fragments of code than in large integrated programs. Use your unit tests to test the code in isolation.

Use available tools

Numerous tools are available to support debugging sessions: interactive debuggers, picky compilers, memory checkers, syntax-directed editors, and so on. The right tool can make a difficult job easy. With one tough-to-find error, for example, one part of the program was overwriting another part's memory. This error was difficult to diagnose using conventional debugging practices because the programmer couldn't determine the specific point at which the program was incorrectly overwriting memory. The programmer used a memory breakpoint to set a watch on a specific memory address. When the program wrote to that memory location, the debugger stopped the code and the guilty code was exposed.

This is an example of a problem that's difficult to diagnose analytically but that becomes quite simple when the right tool is applied.

Reproduce the error several different ways

Sometimes trying cases that are similar to the error-producing case but not exactly the same is instructive. Think of this approach as triangulating the defect. If you can get a fix on it from one point and a fix on it from another, you can better determine exactly where it is.

As illustrated by Figure 23-1, reproducing an error several different ways helps diagnose the cause of the error. Once you think you've identified the defect, run a case that's close to the cases that produce errors but that should not produce an error itself. If it does produce an error, you don't completely understand the problem yet. Errors often arise from combinations of factors, and trying to diagnose the problem with only one test case often doesn't diagnose the root problem.

Generate more data to generate more hypotheses

Choose test cases that are different from the test cases you already know to be erroneous or correct. Run them to generate more data, and use the new data to add to your list of possible hypotheses.

Use the results of negative tests

Suppose you create a hypothesis and run a test case to prove it. Suppose further that the test case disproves the hypothesis, so you still don't know the source of the error. You do know something you didn't beforenamely, that the defect is not in the area you thought it was. That narrows your search field and the set of remaining possible hypotheses.

Brainstorm for possible hypotheses

Rather than limiting yourself to the first hypothesis you think of, try to come up with several. Don't analyze them at first-just come up with as many as you can in a few minutes. Then look at each hypothesis and think about test cases that would prove or disprove it. This mental exercise is helpful in breaking the debugging logjam that results from concentrating too hard on a single line of reasoning.

Keep a notepad by your desk, and make a list of things to try

One reason programmers get stuck during debugging sessions is that they go too far down dead-end paths. Make a list of things to try, and if one approach isn't working, move on to the next approach.

Narrow the suspicious region of the code

If you've been testing the whole program or a whole class or routine, test a smaller part instead. Use print statements, logging, or tracing to identify which section of code is producing the error.

If you need a more powerful technique to narrow the suspicious region of the code, systematically remove parts of the program and see whether the error still occurs. If it doesn't, you know it's in the part you took away. If it does, you know it's in the part you've kept.

Rather than removing regions haphazardly, divide and conquer. Use a binary search algorithm to focus your search. Try to remove about half the code the first time. Determine the half the defect is in, and then divide that section. Again, determine which half contains the defect, and again, chop that section in half. Continue until you find the defect.

If you use many small routines, you'll be able to chop out sections of code simply by commenting out calls to the routines. Otherwise, you can use comments or preprocessor commands to remove code.

If you're using a debugger, you don't necessarily have to remove pieces of code. You can set a breakpoint partway through the program and check for the defect that way instead. If your debugger allows you to skip calls to routines, eliminate suspects by skipping the execution of certain routines and seeing whether the error still occurs. The process with a debugger is otherwise similar to the one in which pieces of a program are physically removed.

Be suspicious of classes and routines that have had defects before

Classes that have had defects before are likely to continue to have defects. A class that has been troublesome in the past is more likely to contain a new defect than a class that has been defect-free. Reexamine error-prone classes and routines.

Check code that's changed recently

If you have a new error that's hard to diagnose, it's usually related to code that's changed recently. It could be in completely new code or in changes to old code. If you can't find a defect, run an old version of the program to see whether the error occurs. If it doesn't, you know the error's in the new version or is caused by an interaction with the new version. Scrutinize the differences between the old and new versions. Check the version control log to see what code has changed recently. If that's not possible, use a diff tool to compare changes in the old, working source code to the new, broken source code.

Expand the suspicious region of the code

It's easy to focus on a small section of code, sure that "the defect must be in this section." If you don't find it in the section, consider the possibility that the defect isn't in the section. Expand the area of code you suspect, and then focus on pieces of it by using the binary search technique described earlier.

Integrate incrementally

Debugging is easy if you add pieces to a system one at a time. If you add a piece to a system and encounter a new error, remove the piece and test it separately.

Check for common defects

Use code-quality checklists to stimulate your thinking about possible defects. If you're following the inspection practices described in Section 21.3, "Formal Inspections," you'll have your own fine-tuned checklist of the common problems in your environment. You can also use the checklists that appear throughout this book. See the "List of Checklists" following the book's table of contents.

Talk to someone else about the problem

Some people call this "confessional debugging." You often discover your own defect in the act of explaining it to another person. For example, if you were explaining the problem in the salary example, you might sound like this:

Hey, Jennifer, have you got a minute? I'm having a problem. I've got this list of employee salaries that's supposed to be sorted, but some names are out of order. They're sorted all right the second time I print them out but not the first. I checked to see if it was new names, but I tried some that worked. I know they should be sorted the first time I print them because the program sorts all the names as they're entered and again when they're saved-wait a minute-no, it I doesn't sort them when they're entered. That's right. It only orders them roughly. Thanks, Jennifer. You've been a big help.

Jennifer didn't say a word, and you solved your problem. This result is typical, and this approach is a potent tool for solving difficult defects.

Take a break from the problem

Sometimes you concentrate so hard you can't think. How many times have you paused for a cup of coffee and figured out the problem on your way to the coffee machine? Or in the middle of lunch? Or on the way home? Or in the shower the next morning? If you're debugging and making no progress, once you've tried all the options, let it rest. Go for a walk. Work on something else. Go home for the day. Let your subconscious mind tease a solution out of the problem.

The auxiliary benefit of giving up temporarily is that it reduces the anxiety associated with debugging. The onset of anxiety is a clear sign that it's time to take a break.

Brute-Force Debugging

Brute force is an often-overlooked approach to debugging software problems. By "brute force," I'm referring to a technique that might be tedious, arduous, and timeconsuming but that is guaranteed to solve the problem. Which specific techniques are guaranteed to solve a problem are context-dependent, but here are some general candidates:

Perform a full design and/or code review on the broken code.
Throw away the section of code and redesign/recode it from scratch.
Throw away the whole program and redesign/recode it from scratch.
Compile code with full debugging information.
Compile code at pickiest warning level and fix all the picky compiler warnings.
Strap on a unit test harness and test the new code in isolation.
Create an automated test suite and run it all night.
Step through a big loop in the debugger manually until you get to the error condition.
Instrument the code with print, display, or other logging statements.
Compile the code with a different compiler.
Compile and run the program in a different environment.
Link or run the code against special libraries or execution environments that produce warnings when code is used incorrectly.
Replicate the end-user's full machine configuration.
Integrate new code in small pieces, fully testing each piece as it's integrated.

Set a maximum time for quick and dirty debugging

For each brute-force technique, your reaction might well be, "I can't do that-it's too much work!" The point is that it's only too much work if it takes more time than what I call "quick and dirty debugging." It's always tempting to try for a quick guess rather than systematically instrumenting the code and giving the defect no place to hide. The gambler in each of us would rather use a risky approach that might find the defect in five minutes than the sure-fire approach that will find the defect in half an hour. The risk is that if the five-minute approach doesn't work, you get stubborn. Finding the defect the "easy" way becomes a matter of principle, and hours pass unproductively, as do days, weeks, months.... How often have you spent two hours debugging code that took only 30 minutes to write? That's a bad distribution of labor, and you would have been better off to rewrite the code than to debug bad code.

When you decide to go for the quick victory, set a maximum time limit for trying the quick way. If you go past the time limit, resign yourself to the idea that the defect is going to be harder to diagnose than you originally thought, and flush it out the hard way. This approach allows you to get the easy defects right away and the hard defects after a bit longer.

Make a list of brute-force techniques

Before you begin debugging a difficult error, ask yourself, "If I get stuck debugging this problem, is there some way that I am guaranteed to be able to fix the problem?" If you can identify at least one brute-force technique that will fix the problem-including rewriting the code in question-it's less likely that you'll waste hours or days when there's a quicker alternative.

Syntax Errors

Syntax-error problems are going the way of the woolly mammoth and the sabertoothed tiger. Compilers are getting better at diagnostic messages, and the days when you had to spend two hours finding a misplaced semicolon in a Pascal listing are almost gone. Here's a list of guidelines you can use to hasten the extinction of this endangered species:

Don't trust line numbers in compiler messages

When your compiler reports a mysterious syntax error, look immediately before and immediately after the error-the compiler could have misunderstood the problem or could simply have poor diagnostics. Once you find the real defect, try to determine the reason the compiler put the message on the wrong statement. Understanding your compiler better can help you find future defects.

Don't trust compiler messages

Compilers try to tell you exactly what's wrong, but compilers are dissembling little rascals, and you often have to read between the lines to know what one really means. For example, in UNIX C, you can get a message that says "floating exception" for an integer divide-by- 0 . With C++'s Standard Template

Don't trust the compiler's second message

Some compilers are better than others at detecting multiple errors. Some compilers get so excited after detecting the first error that they become giddy and overconfident; they prattle on with dozens of error messages that don't mean anything. Other compilers are more levelheaded, and although they must feel a sense of accomplishment when they detect an error, they refrain from spewing out inaccurate messages. When your compiler generates a series of cascading error messages, don't worry if you can't quickly find the source of the second or third error message. Fix the first one and recompile.

Divide and conquer

The idea of dividing the program into sections to help detect defects works especially well for syntax errors. If you have a troublesome syntax error, remove part of the code and compile again. You'll either get no error (because the error's in the part you removed), get the same error (meaning you need to remove a different part), or get a different error (because you'll have tricked the compiler into producing a message that makes more sense).

Find misplaced comments and quotation marks

Many programming text editors automatically format comments, string literals, and other syntactical elements. In more primitive environments, a misplaced comment or quotation mark can trip up the compiler. To find the extra comment or quotation mark, insert the following sequence into your code in C, C++, and Java:

/*"/**/

This code phrase will terminate either a comment or string, which is useful in narrowing the space in which the unterminated comment or string is hiding.

23.3 Fixing a Defect

The hard part of debugging is finding the defect. Fixing the defect is the easy part. But as with many easy tasks, the fact that it's easy makes it especially error-prone. At least one study found that defect corrections have more than a 50 percent chance of being wrong the first time (Yourdon 1986b). Here are a few guidelines for reducing the chance of error:

Understand the problem before you fix it

"The Devil's Guide to Debugging" is right: the best way to make your life difficult and corrode the quality of your program is to fix problems without really understanding them. Before you fix a problem, make sure you understand it to the core. Triangulate the defect both with cases that should reproduce the error and with cases that shouldn't reproduce the error. Keep at it until you understand the problem well enough to predict its occurrence correctly every time.

Understand the program, not just the problem

If you understand the context in which a problem occurs, you're more likely to solve the problem completely rather than only one aspect of it. A study done with short programs found that programmers who achieve a global understanding of program behavior have a better chance of modifying it successfully than programmers who focus on local behavior, learning about the program only as they need to (Littman et al. 1986). Because the program in this study was small ( 280 lines), it doesn't prove that you should try to understand a 50,000 -line program completely before you fix a defect. It does suggest that you should understand at least the code in the vicinity of the defect correction-the "vicinity" being not a few lines but a few hundred.

Confirm the defect diagnosis

Before you rush to fix a defect, make sure that you've diagnosed the problem correctly. Take the time to run test cases that prove your hypothesis and disprove competing hypotheses. If you've proven only that the error could be the result of one of several causes, you don't yet have enough evidence to work on the one cause; rule out the others first.

Relax

A programmer was ready for a ski trip. His product was ready to ship, he was already late, and he had only one more defect to correct. He changed the source file and checked it into version control. He didn't recompile the program and didn't verify that the change was correct.

In fact, the change was not correct, and his manager was outraged. How could he change code in a product that was ready to ship without checking it? What could be worse? Isn't this the pinnacle of professional recklessness?

If this isn't the height of recklessness, it's close and it's common. Hurrying to solve a problem is one of the most time-ineffective things you can do. It leads to rushed judgments, incomplete defect diagnosis, and incomplete corrections. Wishful thinking can lead you to see solutions where there are none. The pressure-often self-imposedencourages haphazard trial-and-error solutions and the assumption that a solution works without verification that it does.

In striking contrast, during the final days of Microsoft Windows 2000 development, a developer needed to fix a defect that was the last remaining defect before a Release Candidate could be created. The developer changed the code, checked his fix, and tested his fix on his local build. But he didn't check the fix into version control at that point. Instead, he went to play basketball. He said, "I'm feeling too stressed right now to be sure that I've considered everything I should consider. I'm going to clear my mind for an hour, and then I'll come back and check in the code-once I've convinced myself that the fix is really correct."

Relax long enough to make sure your solution is right. Don't be tempted to take shortcuts. It might take more time, but it'll probably take less. If nothing else, you'll fix the problem correctly and your manager won't call you back from your ski trip.

Save the original source code

Before you begin fixing the defect, be sure to archive a version of the code that you can return to later. It's easy to forget which change in a group of changes is the significant one. If you have the original source code, at least you can compare the old and the new files and see where the changes are.

Fix the problem, not the symptom

You should fix the symptom too, but the focus should be on fixing the underlying problem rather than wrapping it in programming duct tape. If you don't thoroughly understand the problem, you're not fixing the code. You're fixing the symptom and making the code worse. Suppose you have this code:

Java Example of Code That Needs to Be Fixed

for ( claimNumber = 0; claimNumber < numClaims[ client ]; claimNumber++ ) {
    sum[ client ] = sum[ client ] + claimAmount[ claimNumber ];
}

Further suppose that when client equals 45 , sum turns out to be wrong by $$ 3.45$ . Here's the wrong way to fix the problem:

Java Example of Making the Code Worse by "Fixing" It

for ( claimNumber = 0; claimNumber < numClaims[ client ]; claimNumber++ ) {
    sum[ client ] = sum[ client ] + claimAmount[ claimNumber ];
}
if ( client == 45 ) {
    sum[ 45 ] = sum[ 45 ] + 3.45;
}

Now suppose that when client equals 37 and the number of claims for the client is 0 , you're not getting 0 . Here's the wrong way to fix the problem:

Java Example of Making the Code Worse by "Fixing" It (continued)

for ( claimNumber = 0; claimNumber < numClaims[ client ]; claimNumber++ ) {
    sum[ client ] = sum[ client ] + claimAmount[ claimNumber ];
}
if ( client == 45 ) {
    sum[ 45 ] = sum[ 45 ] + 3.45;
}
else if ( ( client == 37 ) && ( numClaims[ client ] == 0 ) ) {
    sum[ 37 ] = 0.0;
}

If this doesn't send a cold chill down your spine, you won't be affected by anything else in this book either. It's impossible to list all the problems with this approach in a book that's only around 1000 pages long, but here are the top three:

The fixes won't work most of the time. The problems look as though they're the result of initialization defects. Initialization defects are, by definition, unpredictable, so the fact that the sum for client 45 is off by $$ 3.45$ today doesn't tell you anything about tomorrow. It could be off by $$ 10, 000.02$ , or it could be correct. That's the nature of initialization defects.
It's unmaintainable. When code is special-cased to work around errors, the special cases become the code's most prominent feature. The $$ 3.45$ won't always be $$ 3.45$ , and another error will show up later. The code will be modified again to handle the new special case, and the special case for $$ 3.45$ won't be removed. The code will become increasingly barnacled with special cases. Eventually the barnacles will be too heavy for the code to support, and the code will sink to the bottom of the ocean-a fitting place for it.
It uses the computer for something that's better done by hand. Computers are good at predictable, systematic calculations, but humans are better at fudging data creatively. You'd be wiser to treat the output with whiteout and a typewriter than to monkey with the code.

Change the code only for good reason

Related to fixing symptoms is the technique of changing code at random until it seems to work. The typical line of reasoning goes like this: "This loop seems to contain a defect. It's probably an off-by-one error, so I'll just put a -1 here and try it. OK. That didn't work, so I'll just put a +1 in instead. OK. That seems to work. I'll say it's fixed."

As popular as this practice is, it isn't effective. Making changes to code randomly is like rotating a Pontiac Aztek's tires to fix an engine problem. You're not learning anything; you're just goofing around. By changing the program randomly, you say in effect, "I don't know what's happening here, but I'll try this change and hope it works." Don't change code randomly. That's voodoo programming. The more different you make it without understanding it, the less confidence you'll have that it works correctly.

Before you make a change, be confident that it will work. Being wrong about a change should leave you astonished. It should cause self-doubt, personal reevaluation, and deep soul-searching. It should happen rarely.

Make one change at a time

Changes are tricky enough when they're done one at a time. When done two at a time, they can introduce subtle errors that look like the original errors. Then you're in the awkward position of not knowing whether you didn't correct the error, whether you corrected the error but introduced a new one that looks similar, or whether you didn't correct the error and you introduced a similar new error. Keep it simple: make just one change at a time.

Check your fix

Check the program yourself, have someone else check it for you, or walk through it with someone else. Run the same triangulation test cases you used to diagnose the problem to make sure that all aspects of the problem have been resolved. If you've solved only part of the problem, you'll find out that you still have work to do.

Add a unit test that exposes the defect

When you encounter an error that wasn't exposed by your test suite, add a test case to expose the error so that it won't be reintroduced later.

Look for similar defects

When you find one defect, look for others that are similar. Defects tend to occur in groups, and one of the values of paying attention to the kinds of defects you make is that you can correct all the defects of that kind. Looking for similar defects requires you to have a thorough understanding of the problem. Watch for the warning sign: if you can't figure out how to look for similar defects, that's a sign that you don't yet completely understand the problem.

23.5 Debugging Tools - Obvious and Not-So-Obvious

You can do much of the detailed, brain-busting work of debugging with debugging tools that are readily available. The tool that will drive the final stake through the heart of the defect vampire isn't yet available, but each year brings an incremental improvement in available capabilities.

Source-Code Comparators

A source-code comparator such as Diff is useful when you're modifying a program in response to errors. If you make several changes and need to remove some that you can't quite remember, a comparator can pinpoint the differences and jog your memory. If you discover a defect in a new version that you don't remember in an older version, you can compare the files to determine what changed.

Compiler Warning Messages

One of the simplest and most effective debugging tools is your own compiler.

Set your compiler's warning level to the highest, pickiest level possible, and fix the errors it reports

It's sloppy to ignore compiler errors. It's even sloppier to turn off the warnings so that you can't even see them. Children sometimes think that if they close their eyes and can't see you, they've made you go away. Setting a switch on the compiler to turn off warnings just means you can't see the errors. It doesn't make them go away any more than closing your eyes makes an adult go away.

Assume that the people who wrote the compiler know a great deal more about your language than you do. If they're warning you about something, it usually means you have an opportunity to learn something new about your language. Make the effort to understand what the warning really means.

Treat warnings as errors

Some compilers let you treat warnings as errors. One reason to use the feature is that it elevates the apparent importance of a warning. Just as setting your watch five minutes fast tricks you into thinking it's five minutes later than it is, setting your compiler to treat warnings as errors tricks you into taking them more seriously. Another reason to treat warnings as errors is that they often affect how your program compiles. When you compile and link a program, warnings typically won't stop the program from linking, but errors typically will. If you want to check warnings before you link, set the compiler switch that treats warnings as errors.

Initiate projectwide standards for compile-time settings

Set a standard that requires everyone on your team to compile code using the same compiler settings. Otherwise, when you try to integrate code compiled by different people with different settings, you'll get a flood of error messages and an integration nightmare. This is easy to enforce if you use a project-standard make file or build script.

Extended Syntax and Logic Checking

You can use additional tools to check your code more thoroughly than your compiler does. For example, for C programmers, the lint utility painstakingly checks for use of uninitialized variables (writing $=$ when you mean $==$ ) and similarly subtle problems.

Execution Profilers

You might not think of an execution profiler as a debugging tool, but a few minutes spent studying a program profile can uncover some surprising (and hidden) defects.

For example, I had suspected that a memory-management routine in one of my programs was a performance bottleneck. Memory management had originally been a small component using a lindarly ordered array of pointers to memory. I replaced the linearly ordered array with a hash table in the expectation that execution time would drop by at least half. But after profiling the code, I found no change in performance at all. I examined the code more closely and found a defect that was wasting a huge amount of time in the allocation algorithm. The bottleneck hadn't been the linearsearch technique; it was the defect. I hadn't needed to optimize the search after all. Examine the output of an execution profiler to satisfy yourself that your program spends a reasonable amount of time in each area.

Test Frameworks/Scaffolding

As mentioned in Section 23.2 on finding defects, pulling out a troublesome piece of code, writing code to test it, and executing it by itself is often the most effective way to exorcise the demons from an error-prone program.

Debuggers

Commercially available debuggers have advanced steadily over the years, and the capabilities available today can change the way you program. Good debuggers allow you to set breakpoints to break when execution reaches a specific line, or the $n$ th time it reaches a specific line, or when a global variable changes, or when a variable is assigned a specific value. They allow you to step through code line by line, stepping through or over routines. They allow the program to be executed backwards, stepping back to the point where a defect originated. They allow you to log the execution of specific state-ments-similar to scattering "I'm here!" print statements throughout a program.

Good debuggers allow full examination of data, including structured and dynamically allocated data. They make it easy to view the contents of a linked list of pointers or a dynamically allocated array. They're intelligent about user-defined data types. They allow you to make ad hoc queries about data, assign new values, and continue program execution.

You can look at the high-level language or the assembly language generated by your compiler. If you're using several languages, the debugger automatically displays the correct language for each section of code. You can look at a chain of calls to routines and quickly view the source code of any routine. You can change parameters to a program within the debugger environment.

The best of today's debuggers also remember debugging parameters (breakpoints, variables being watched, and so on) for each individual program so that you don't have to re-create them for each program you debug.

System debuggers operate at the systems level rather than the applications level so that they don't interfere with the execution of the program being debugged. They're essential when you are debugging programs that are sensitive to timing or the amount of memory available.

Given the enormous power offered by modern debuggers, you might be surprised that anyone would criticize them. But some of the most respected people in computer science recommend not using them. They recommend using your brain and avoiding debugging tools altogether. Their argument is that debugging tools are a crutch and that you find problems faster and more accurately by thinking about them than by relying on tools. They argue that you, rather than the debugger, should mentally execute the program to flush out defects.

Regardless of the empirical evidence, the basic argument against debuggers isn't valid. The fact that a tool can be misused doesn't imply that it should be rejected. You wouldn't avoid taking aspirin merely because it's possible to overdose. You wouldn't avoid mowing your lawn with a power mower just because it's possible to cut yourself. Any other powerful tool can be used or abused, and so can a debugger.

The debugger isn't a substitute for good thinking. But, in some cases, thinking isn't a substitute for a good debugger either. The most effective combination is good thinking and a good debugger.

Chapter 24 - Refactoring

Myth: a well-managed software project conducts methodical requirements development and defines a stable list of the program's responsibilities. Design follows requirements, and it is done carefully so that coding can proceed linearly, from start to finish, implying that most of the code can be written once, tested, and forgotten. According to the myth, the only time that the code is significantly modified is during the software-maintenance phase, something that happens only after the initial version of a system has been delivered.

Reality: code evolves substantially during its initial development. Many of the changes seen during initial coding are at least as dramatic as changes seen during maintenance. Coding, debugging, and unit testing consume between 30 to 65 percent of the effort on a typical project, depending on the project's size. (See Chapter 27, "How Program Size Affects Construction," for details.) If coding and unit testing were straightforward processes, they would consume no more than 20-30 percent of the total effort on a project. Even on well-managed projects, however, requirements change by about one to four percent per month (Jones 2000). Requirements changes invariably cause corresponding code changes-sometimes substantial code changes.

Another reality: modern development practices increase the potential for code changes during construction. In older life cycles, the focus-successful or not-was on avoiding code changes. More modern approaches move away from coding predictability. Current approaches are more code-centered, and over the life of a project, you can expect code to evolve more than ever.

24.1 Kinds of Software Evolution

Software evolution is like biological evolution in that some mutations are beneficial and many mutations are not. Good software evolution produces code whose development mimics the ascent from monkeys to Neanderthals to our current exalted state as software developers. Evolutionary forces sometimes beat on a program the other way, however, knocking the program into a deevolutionary spiral.

The key distinction between kinds of software evolution is whether the program's quality improves or degrades under modification. If you fix errors with logical duct tape and superstition, quality degrades. If you treat modifications as opportunities to tighten up the original design of the program, quality improves. If you see that program quality is degrading, that's like that silent canary in a mine shaft I've mentioned before. It's a warning that the program is evolving in the wrong direction.

A second distinction in the kinds of software evolution is the one between changes made during construction and those made during maintenance. These two kinds of evolution differ in several ways. Construction changes are usually made by the original developers, usually before the program has been completely forgotten. The system isn't yet on line, so the pressure to finish changes is only schedule pressure-it's not 500 angry users wondering why their system is down. For the same reason, changes during construction can be more freewheeling-the system is in a more dynamic state, and the penalty for making mistakes is low. These circumstances imply a style of software evolution that's different from what you'd find during software maintenance.

Philosophy of Software Evolution

A common weakness in programmers' approaches to software evolution is that it goes on as an unselfconscious process. If you recognize that evolution during development is an inevitable and important phenomenon and plan for it, you can use it to your advantage.

Evolution is at once hazardous and an opportunity to approach perfection. When you have to make a change, strive to improve the code so that future changes are easier. You never know as much when you begin writing a program as you do afterward. When you have a chance to revise a program, use what you've learned to improve it. Make both your initial code and your changes with further change in mind.

The Cardinal Rule of Software Evolution is that evolution should improve the internal quality of the program. The following sections describe how to accomplish this.

24.2 Introduction to Refactoring

The key strategy in achieving The Cardinal Rule of Software Evolution is refactoring, which Martin Fowler defines as "a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior" (Fowler 1999). The word "refactoring" in modern programming grew out of Larry Constantine's original use of the word "factoring" in structured programming, which referred to decomposing a program into its constituent parts as much as possible (Yourdon and Constantine 1979).

Reasons to Refactor

Sometimes code degenerates under maintenance, and sometimes the code just wasn't very good in the first place. In either case, here are some warning signs -sometimes called "smells" (Fowler 1999)-that indicate where refactorings are needed:

Code is duplicated

Duplicated code almost always represents a failure to fully factor the design in the first place. Duplicate code sets you up to make parallel modifica-tions-whenever you make changes in one place, you have to make parallel changes in another place. It also violates what Andrew Hunt and Dave Thomas refer to as the "DRY principle": Don't Repeat Yourself (2000). I think David Parnas says it best: "Copy and paste is a design error" (McConnell 1998b).

A routine is too long

In object-oriented programming, routines longer than a screen are rarely needed and usually represent the attempt to force-fit a structured programming foot into an object-oriented shoe.

One of my clients was assigned the task of breaking up a legacy system's longest routine, which was more than 12,000 lines long. With effort, he was able to reduce the size of the largest routine to only about 4,000 lines.

One way to improve a system is to increase its modularity-increase the number of well-defined, well-named routines that do one thing and do it well. When changes lead you to revisit a section of code, take the opportunity to check the modularity of the routines in that section. If a routine would be cleaner if part of it were made into a separate routine, create a separate routine.

A loop is too long or too deeply nested

Loop innards tend to be good candidates for being converted into routines, which helps to better factor the code and to reduce the loop's complexity.

A class has poor cohesion

If you find a class that takes ownership for a hodgepodge of unrelated responsibilities, that class should be broken up into multiple classes, each of which has responsibility for a cohesive set of responsibilities.

A class interface does not provide a consistent level of abstraction

Even classes that begin life with a cohesive interface can lose their original consistency. Class interfaces tend to morph over time as a result of modifications that are made in the heat of the moment and that favor expediency to interface integrity. Eventually the class interface becomes a Frankensteinian maintenance monster that does little to improve the intellectual manageability of the program.

A parameter list has too many parameters

Well-factored programs tend to have many small, well-defined routines that don't need large parameter lists. A long parameter list is a warning that the abstraction of the routine interface has not been well thought out.

Changes within a class tend to be compartmentalized

Sometimes a class has two or more distinct responsibilities. When that happens you find yourself changing either one part of the class or another part of the class-but few changes affect both parts of the class. That's a sign that the class should be cleaved into multiple classes along the lines of the separate responsibilities.

Changes require parallel modifications to multiple classes

I saw one project that had a checklist of about 15 classes that had to be modified whenever a new kind of output was added. When you find yourself routinely making changes to the same set of classes, that suggests the code in those classes could be rearranged so that changes affect only one class. In my experience, this is a hard ideal to accomplish, but it's nonetheless a good goal.

Inheritance hierarchies have to be modified in parallel

Finding yourself making a subclass of one class every time you make a subclass of another class is a special kind of parallel modification and should be addressed.

case statements have to be modified in parallel

Although case statements are not inherently bad, if you find yourself making parallel modifications to similar case statements in multiple parts of the program, you should ask whether inheritance might be a better approach.

If you find yourself repeatedly manipulating the same set of data items, you should ask whether those manipulations should be combined into a class of their own.

A routine uses more features of another class than of its own class

This suggests that the routine should be moved into the other class and then invoked by its old class.

A primitive data type is overloaded

Primitive data types can be used to represent an infinite number of real-world entities. If your program uses a primitive data type like an integer to represent a common entity such as money, consider creating a simple Money class so that the compiler can perform type checking on Money variables, so that you can add safety checks on the values assigned to money, and so on. If both Money and Temperature are integers, the compiler won't warn you about erroneous assignments like bankBalance $=$ recordLowTemperature.

A class doesn't do very much

Sometimes the result of refactoring code is that an old class doesn't have much to do. If a class doesn't seem to be carrying its weight, ask if you should assign all of that class's responsibilities to other classes and eliminate the class altogether.

A chain of routines passes tramp data

Finding yourself passing data to one routine just so that routine can pass it to another routine is called "tramp data" (Page-Jones 1988). This might be OK, but ask yourself whether passing the specific data in question is consistent with the abstraction presented by each of the routine interfaces. If the abstraction for each routine is OK, passing the data is OK. If not, find some way to make each routine's interface more consistent.

A middleman object isn't doing anything

If you find that most of the code in a class is just passing off calls to routines in other classes, consider whether you should eliminate the middleman and call those other classes directly.

One class is overly intimate with another

Encapsulation (information hiding) is probably the strongest tool you have to make your program intellectually manageable and to minimize ripple effects of code changes. Anytime you see one class that knows more about another class than it should-including derived classes knowing too much about their parents-err on the side of stronger encapsulation rather than weaker.

A routine has a poor name

If a routine has a poor name, change the name of the routine where it's defined, change the name in all places it's called, and then recompile. As hard as it might be to do this now, it will be even harder later, so do it as soon as you notice it's a problem.

Data members are public

Public data members are, in my view, always a bad idea. They blur the line between interface and implementation, and they inherently violate encapsulation and limit future flexibility. Strongly consider hiding public data members behind access routines.

A subclass uses only a small percentage of its parents' routines

Typically this indicates that that subclass has been created because a parent class happened to contain the routines it needed, not because the subclass is logically a descendent of the superclass. Consider achieving better encapsulation by switching the subclass's relationship to its superclass from an is-a relationship to a has-a relationship; convert the superclass to member data of the former subclass, and expose only the routines in the former subclass that are really needed.

Comments are used to explain difficult code

Comments have an important role to play, but they should not be used as a crutch to explain bad code. The age-old wisdom is dead-on: "Don't document bad code-rewrite it" (Kernighan and Plauger 1978).

Global variables are used

When you revisit a section of code that uses global variables, take time to reexamine them. You might have thought of a way to avoid using global variables since the last time you visited that part of the code. Because you're less familiar with the code than when you first wrote it, you might now find the global variables sufficiently confusing that you're willing to develop a cleaner approach. You might also have a better sense of how to isolate global variables in access routines and a keener sense of the pain caused by not doing so. Bite the bullet and make the beneficial modifications. The initial coding will be far enough in the past that you can be objective about your work yet close enough that you will remember most of what you need to make the revisions correctly. The time during early revisions is the perfect time to improve the code.

A routine uses setup code before a routine call or takedown code after a routine call

Code like this should be a warning to you:

Bad C++ Example of Setup and Takedown Code for a Routine Call

WithdrawalTransaction withdrawal;
withdrawal.SetCustomerId( customerId );
withdrawal.SetBalance( balance );
withdrawal.SetWithdrawalAmount( withdrawalAmount );
withdrawal.SetWithdrawalDate( withdrawalDate );

ProcessWithdrawal( withdrawal );

customerId = withdrawal.GetCustomerId();
balance = withdrawal.GetBalance();
withdrawalAmount = withdrawal.GetWithdrawalAmount();
withdrawalDate = withdrawal.GetWithdrawalDate();

A similar warning sign is when you find yourself creating a special constructor for the WithdrawalTransaction class that takes a subset of its normal initialization data so that you can write code like this:

Bad C++ Example of Setup and Takedown Code for a Method Call

withdrawal = new WithdrawalTransaction( customerId, balance,
    withdrawalAmount, withdrawalDate );
withdrawal.ProcessWithdrawal();
delete withdrawal;

Anytime you see code that sets up for a call to a routine or takes down after a call to a routine, ask whether the routine interface is presenting the right abstraction. In this case, perhaps the parameter list of ProcessWithdrawal should be modified to support code like this:

Good C++ Example of a Routine That Doesn't Require Setup or Takedown Code

ProcessWithdrawal( customerId, balance, withdrawalAmount, withdrawalDate );

Note that the converse of this example presents a similar problem. If you find yourself usually having a WithdrawalTransaction object in hand but needing to pass several of its values to a routine like the one shown here, you should also consider refactoring the ProcessWithdrawal interface so that it requires the WithdrawalTransaction object rather than its individual fields:

C++ Example of Code That Requires Several Method Calls

ProcessWithdrawal( withdrawal.GetCustomerId(), withdrawal.GetBalance(),
    withdrawal.GetWithdrawalAmount(), withdrawal.GetWithdrawalDate() );

Any of these approaches can be right, and any can be wrong-it depends on whether the ProcessWithdrawal() interface's abstraction is that it expects to have four distinct pieces of data or expects to have a WithdrawalTransaction object.

A program contains code that seems like it might be needed someday

Programmers are notoriously bad at guessing what functionality might be needed someday. "Designing ahead" is subject to numerous predictable problems:

Requirements for the "design ahead" code haven't been fully developed, which means the programmer will likely guess wrong about those future requirements. The "code ahead" work will ultimately be thrown away.
If the programmer's guess about the future requirement is pretty close, the programmer still will not generally anticipate all the intricacies of the future requirement. These intricacies undermine the programmer's basic design assumptions, which means the "design ahead" work will have to be thrown away.
Future programmers who use the "design ahead" code don't know that it was "design ahead" code, or they assume the code works better than it does. They assume that the code has been coded, tested, and reviewed to the same level as the other code. They waste a lot of time building code that uses the "design ahead" code, only to discover ultimately that the "design ahead" code doesn't actually work.
The additional "design ahead" code creates additional complexity, which calls for additional testing, additional defect correction, and so on. The overall effect is to slow down the project.

Experts agree that the best way to prepare for future requirements is not to write speculative code; it's to make the currently required code as clear and straightforward as possible so that future programmers will know what it does and does not do and will make their changes accordingly (Fowler 1999, Beck 2000).

Reasons Not to Refactor

In common parlance, "refactoring" is used loosely to refer to fixing defects, adding functionality, modifying the design-essentially as a synonym for making any change to the code whatsoever. This common dilution of the term's meaning is unfortunate. Change in itself is not a virtue, but purposeful change, applied with a teaspoonful of discipline, can be the key strategy that supports steady improvement in a program's quality under maintenance and prevents the all-too-familiar software-entropy death spiral.

24.3 Specific Refactorings

In this section, I present a catalog of refactorings, many of which I describe by summarizing the more detailed descriptions presented in Refactoring (Fowler 1999). I have not, however, attempted to make this catalog exhaustive. In a sense, every case in this book that shows a "bad code" example and a "good code" example is a candidate for becoming a refactoring. In the interest of space, I've focused on the refactorings I personally have found most useful.

Data-Level Refactorings

Here are refactorings that improve the use of variables and other kinds of data.

Replace a magic number with a named constant

If you're using a numeric or string literal like 3.14 , replace that literal with a named constant like $P I$ .

Rename a variable with a clearer or more informative name

If a variable's name isn't clear, change it to a better name. The same advice applies to renaming constants, classes, and routines, of course.

Move an expression inline

Replace an intermediate variable that was assigned the result of an expression with the expression itself.

Replace an expression with a routine

Replace an expression with a routine (usually so that the expression isn't duplicated in the code).

Introduce an intermediate variable

Assign an expression to an intermediate variable whose name summarizes the purpose of the expression.

Convert a multiuse variable to multiple single-use variables

If a variable is used for more than one purpose-common culprits are $i, j$ , temp , and $x$ -create separate variables for each usage, each of which has a more specific name.

Use a local variable for local purposes rather than a parameter

If an input-only routine parameter is being used as a local variable, create a local variable and use that instead.

Convert a data primitive to a class

If a data primitive needs additional behavior (including stricter type checking) or additional data, convert the data to an object and add the behavior you need. This can apply to simple numeric types like Money and Temperature. It can also apply to enumerated types like Color, Shape, Country, or OutputType.

Convert a set of type codes to a class or an enumeration

In older programs, it's common to see associations like

const int SCREEN = 0;
const int PRINTER = 1;
const int FILE = 2;

Rather than defining standalone constants, create a class so that you can receive the benefits of stricter type checking and set yourself up to provide richer semantics for OutputType if you ever need to. Creating an enumeration is sometimes a good alternative to creating a class.

Convert a set of type codes to a class with subclasses

If the different elements associated with different types might have different behavior, consider creating a base class for the type with subclasses for each type code. For the OutputType base class, you might create subclasses like Screen, Printer, and File.

Change an array to an object

If you're using an array in which different elements are different types, create an object that has a field for each former element of the array.

Encapsulate a collection

If a class returns a collection, having multiple instances of the collection floating around can create synchronization difficulties. Consider having the class return a read-only collection, and provide routines to add and remove elements from the collection.

Replace a traditional record with a data class

Create a class that contains the members of the record. Creating a class allows you to centralize error checking, persistence, and other operations that concern the record.

Statement-Level Refactorings

Here are refactorings that improve the use of individual statements.

Decompose a boolean expression

Simplify a boolean expression by introducing wellnamed intermediate variables that help document the meaning of the expression.

Move a complex boolean expression into a well-named boolean function

If the expression is complicated enough, this refactoring can improve readability. If the expression is used more than once, it eliminates the need for parallel modifications and reduces the chance of error in using the expression.

Consolidate fragments that are duplicated within different parts of a conditional

If you have the same lines of code repeated at the end of an else block that you have at the end of the if block, move those lines of code so that they occur after the entire if-then-else block.

Use break or return instead of a loop control variable

If you have a variable within a loop like done that's used to control the loop, use break or return to exit the loop instead.

Return as soon as you know the answer instead of assigning a return value within nested if-then-else statements

Code is often easiest to read and least error-prone if you exit a routine as soon as you know the return value. The alternative of setting a return value and then unwinding your way through a lot of logic can be harder to follow.

Replace conditionals (especially repeated case statements) with polymorphism

Much of the logic that used to be contained in case statements in structured programs can instead be baked into the inheritance hierarchy and accomplished through polymorphic routine calls.

Create and use null objects instead of testing for null values

Sometimes a null object will have generic behavior or data associated with it, such as referring to a resident whose name is not known as "occupant." In this case, consider moving the responsibility for handling null values out of the client code and into the class-that is, have the Customer class define the unknown resident as "occupant" instead of having Customer's client code repeatedly test for whether the customer's name is known and substitute "occupant" if not.

Routine-Level Refactorings

Here are refactorings that improve code at the individual-routine level.

Extract routine/extract method

Remove inline code from one routine, and turn it into its own routine.

Move a routine's code inline

Take code from a routine whose body is simple and self-explanatory, and move that routine's code inline where it is used.

Convert a long routine to a class

If a routine is too long, sometimes turning it into a class and then further factoring the former routine into multiple routines will improve readability.

Substitute a simple algorithm for a complex algorithm

Replace a complicated algorithm with a simpler algorithm.

Add a parameter

If a routine needs more information from its caller, add a parameter so that that information can be provided.

Remove a parameter

If a routine no longer uses a parameter, remove it.

Separate query operations from modification operations

Normally, query operations don't change an object's state. If an operation like GetTotals() changes an object's state, separate the query functionality from the state-changing functionality and provide two separate routines.

Combine similar routines by parameterizing them

Two similar routines might differ only with respect to a constant value that's used within the routine. Combine the routines into one routine, and pass in the value to be used as a parameter.

Separate routines whose behavior depends on parameters passed in

If a routine executes different code depending on the value of an input parameter, consider breaking the routine into separate routines that can be called separately, without passing in that particular input parameter.

Pass a whole object rather than specific fields

If you find yourself passing several values from the same object into a routine, consider changing the routine's interface so that it takes the whole object instead.

Pass specific fields rather than a whole object

If you find yourself creating an object just so that you can pass it to a routine, consider modifying the routine so that it takes specific fields rather than a whole object.

Encapsulate downcasting

If a routine returns an object, it normally should return the most specific type of object it knows about. This is particularly applicable to routines that return iterators, collections, elements of collections, and so on.

Class Implementation Refactorings

Here are refactorings that improve at the class level.

Change value objects to reference objects

If you find yourself creating and maintaining numerous copies of large or complex objects, change your usage of those objects so that only one master copy exists (the value object) and the rest of the code uses references to that object (reference objects).

Change reference objects to value objects

If you find yourself performing a lot of reference housekeeping for small or simple objects, change your usage of those objects so that all objects are value objects.

Replace virtual routines with data initialization

If you have a set of subclasses that vary only according to constant values they return, rather than overriding member routines in the derived classes, have the derived classes initialize the class with appropriate constant values, and then have generic code in the base class that works with those values.

Change member routine or data placement

Consider making several general changes in an inheritance hierarchy. These changes are normally performed to eliminate duplication in derived classes:

Pull a routine up into its superclass.
Pull a field up into its superclass.
Pull a constructor body up into its superclass. Several other changes are normally made to support specialization in derived classes:
Push a routine down into its derived classes.
Push a field down into its derived classes.
Push a constructor body down into its derived classes.

Extract specialized code into a subclass

If a class has code that's used by only a subset of its instances, move that specialized code into its own subclass.

Combine similar code into a superclass

If two subclasses have similar code, combine that code and move it into the superclass.

Class Interface Refactorings

Here are refactorings that make for better class interfaces.

Move a routine to another class

Create a new routine in the target class, and move the body of the routine from the source class into the target class. You can then call the new routine from the old routine.

Convert one class to two

If a class has two or more distinct areas of responsibility, break the class into multiple classes, each of which has a clearly defined responsibility.

Eliminate a class

If a class isn't doing much, move its code into other classes that are more cohesive and eliminate the class.

Hide a delegate

Sometimes Class $A$ calls Class $B$ and Class $C$ , when really Class $A$ should call only Class B and Class B should call Class C. Ask yourself what the right abstraction is for A's interaction with $B$ . If $B$ should be responsible for calling $C$ , have B call C.

Remove a middleman

If Class A calls Class B and Class B calls Class C, sometimes it works better to have Class A call Class C directly. The question of whether you should delegate to Class B depends on what will best maintain the integrity of Class B's interface.

Replace inheritance with delegation

If a class needs to use another class but wants more control over its interface, make the superclass a field of the former subclass and then expose a set of routines that will provide a cohesive abstraction.

Replace delegation with inheritance

If a class exposes every public routine of a delegate class (member class), inherit from the delegate class instead of just using the class.

Introduce a foreign routine

If a class needs an additional routine and you can't modify the class to provide it, you can create a new routine within the client class that provides that functionality.

Introduce an extension class

If a class needs several additional routines and you can't modify the class, you can create a new class that combines the unmodifiable class's functionality with the additional functionality. You can do that either by subclassing the original class and adding new routines or by wrapping the class and exposing the routines you need.

Encapsulate an exposed member variable

If member data is public, change the member data to private and expose the member data's value through a routine instead.

Remove Set() routines for fields that cannot be changed

If a field is supposed to be set at object creation time and not changed afterward, initialize that field in the object's constructor rather than providing a misleading Set() routine.

Hide routines that are not intended to be used outside the class

If the class interface would be more coherent without a routine, hide the routine.

Encapsulate unused routines

If you find yourself routinely using only a portion of a class's interface, create a new interface to the class that exposes only those necessary routines. Be sure that the new interface provides a coherent abstraction.

Collapse a superclass and subclass if their implementations are very similar

If the subclass doesn't provide much specialization, combine it into its superclass.

System-Level Refactorings

Here are refactorings that improve code at the whole-system level.

Create a definitive reference source for data you can't control

Sometimes you have data maintained by the system that you can't conveniently or consistently access from other objects that need to know about that data. A common example is data maintained in a GUI control. In such a case, you can create a class that mirrors the data in the GUI control, and then have both the GUI control and the other code treat that class as the definitive source of that data.

Change unidirectional class association to bidirectional class association

If you have two classes that need to use each other's features but only one class can know about the other class, change the classes so that they both know about each other.

Change bidirectional class association to unidirectional class association

If you have two classes that know about each other's features but only one class that really needs to know about the other, change the classes so that one knows about the other but not vice versa.

Provide a factory method rather than a simple constructor

Use a factory method (routine) when you need to create objects based on a type code or when you want to work with reference objects rather than value objects.

Replace error codes with exceptions or vice versa

Depending on your error-handling strategy, make sure the code is using the standard approach.

24.4 Refactoring Safely

Refactoring is a powerful technique for improving code quality. Like all powerful tools, refactoring can cause more harm than good if misused. A few simple guidelines can prevent refactoring missteps.

Save the code you start with

Before you begin refactoring, make sure you can get back to the code you started with. Save a version in your revision control system, or copy the correct files to a backup directory.

Keep refactorings small

Some refactorings are larger than others, and exactly what constitutes "one refactoring" can be a little fuzzy. Keep the refactorings small so that you fully understand all the impacts of the changes you make. The detailed refactorings described in Refactoring (Fowler 1999) provide many good examples of how to do this.

Do refactorings one at a time

Some refactorings are more complicated than others. For all but the simplest refactorings, do the refactorings one at a time, recompiling and retesting after a refactoring before doing the next one.

Make a list of steps you intend to take

A natural extension of the Pseudocode Programming Process is to make a list of the refactorings that will get you from Point A to Point B. Making a list helps you keep each change in context.

Make a parking lot

When you're midway through one refactoring, you'll sometimes find that you need another refactoring. Midway through that refactoring, you find a third refactoring that would be beneficial. For changes that aren't needed immediately, make a "parking lot," a list of the changes that you'd like to make at some point but that don't need to be made right now.

Make frequent checkpoints

It's easy to find the code suddenly going sideways while you're refactoring. In addition to saving the code you started with, save checkpoints at various steps in a refactoring session so that you can get back to a working program if you code yourself into a dead end.

Use your compiler warnings

It's easy to make small errors that slip past the compiler. Setting your compiler to the pickiest warning level possible will help catch many errors almost as soon as you type them.

Retest

Reviews of changed code should be complemented by retests. Of course, this is dependent on having a good set of test cases in the first place. Regression testing and other test topics are described in more detail in Chapter 22, "Developer Testing."

Add test cases

In addition to retesting with your old tests, add new unit tests to exercise the new code. Remove any test cases that have been made obsolete by the refactorings.

Review the changes

If reviews are important the first time through, they are even more important during subsequent modifications. Ed Yourdon reports that programmers typically have more than a 50 percent chance of making an error on their first attempt to make a change (Yourdon 1986b). Interestingly, if programmers work with a substantial portion of the code, rather than just a few lines, the chance of making a correct modification improves, as shown in Figure 24-1. Specifically, as the number of lines changed increases from one to five lines, the chance of making a bad change increases. After that, the chance of making a bad change decreases.

Figure 24-1 Small changes tend to be more error-prone than larger changes (Weinberg 1983).

Programmers treat small changes casually. They don't desk-check them, they don't have others review them, and they sometimes don't even run the code to verify that the fix works properly.

The moral is simple: treat simple changes as if they were complicated. One organization that introduced reviews for one-line changes found that its error rate went from 55 percent before reviews to 2 percent afterward (Freedman and Weinberg 1982). A telecommunications organization went from 86 percent correct before reviewing code changes to 99.6 percent afterward (Perrott 2004).

Adjust your approach depending on the risk level of the refactoring

Some refactorings are riskier than others. A refactoring like "Replace a magic number with a named constant" is relatively risk-free. Refactorings that involve class or routine interface changes, database schema changes, or changes to boolean tests, among others, tend to be more risky. For easier refactorings, you might strearhline your refactoring process to do more than one refactoring at a time and to simply retest, without going through an official review.

For riskier refactorings, err on the side of caution. Do the refactorings one at a time. Have someone else review the refactoring or use pair programming for that refactoring, in addition to the normal compiler checking and unit tests.

Bad Times to Refactor

Refactoring is a powerful technique, but it isn't a panacea and it's subject to a few specific kinds of abuse.

Don't use refactoring as a cover for code and fix

The worst problem with refactoring is how it's misused. Programmers will sometimes say they're refactoring, when all they're really doing is tweaking the code, hoping to find a way to make it work. Refactoring refers to changes in working code that do not affect the program's behavior. Programmers who are tweaking broken code aren't refactoring; they're hacking.

Avoid refactoring instead of rewriting

Sometimes code doesn't need small changes-it needs to be tossed out so that you can start over. If you find yourself in a major refactoring session, ask yourself whether instead you should be redesigning and reimplementing that section of code from the ground up.

24.5 Refactoring Strategies

The number of refactorings that would be beneficial to any specific program is essentially infinite. Refactoring is subject to the same law of diminishing returns as other programming activities, and the $80 / 20$ rule applies. Spend your time on the 20 percent of the refactorings that provide 80 percent of the benefit. Consider the following guidelines when deciding which refactorings are most important:

Refactor when you add a routine

When you add a routine, check whether related routines are well organized. If not, refactor them.

Refactor when you add a class

Adding a class often brings issues with existing code to the fore. Use this time as an opportunity to refactor other classes that are closely related to the class you're adding.

Refactor when you fix a defect

Use the understanding you gain from fixing a bug to improve other code that might be prone to similar defects.

Target error-prone modules

Some modules are more error-prone and brittle than others. Is there a section of code that you and everyone else on your team is afraid of? That's probably an error-prone module. Although most people's natural tendency is to avoid these challenging sections of code, targeting these sections for refactoring can be one of the more effective strategies (Jones 2000).

Target high-complexity modules

Another approach is to focus on modules that have the highest complexity ratings. (See "How to Measure Complexity" in Section 19.6 for details on these metrics.) One classic study found that program quality improved dramatically when maintenance programmers focused their improvement efforts on the modules that had the highest complexity (Henry and Kafura 1984).

In a maintenance environment, improve the parts you touch

Code that is never modified doesn't need to be refactored. But when you do touch a section of code, be sure you leave it better than you found it.

Define an interface between clean code and ugly code, and then move code across the interface

The "real world" is often messier than you'd like. The messiness might come from complicated business rules, hardware interfaces, or software interfaces. A common problem with geriatric systems is poorly written production code that must remain operational at all times.

An effective strategy for rejuvenating geriatric production systems is to designate some code as being in the messy real world, some code as being in an idealized new world, and some code as being the interface between the two. Figure 24-2 illustrates this idea.

Figure 24-2 Your code doesn't have to be messy just because the real world is messy. Conceive your system as a combination of ideal code, interfaces from the ideal code to the messy real world, and the messy real world.

As you work with the system, you can begin moving code across the "real world interface" into a more organized ideal world. When you begin working with a legacy system, the poorly written legacy code might make up nearly all the system. One policy that works well is that anytime you touch a section of messy code, you are required to bring it up to current coding standards, give it clear variable names, and so on-effectively moving it into the ideal world. Over time this can provide for a rapid improvement in a code base, as shown in Figure 24-3.

Figure 24-3 One strategy for improving production code is to refactor poorly written legacy code as you touch it, so as to move it to the other side of the "interface to the messy real world."

Key Points

Program changes are a fact of life both during initial development and after initial release.
Software can either improve or degrade as it's changed. The Cardinal Rule of Software Evolution is that internal quality should improve with code evolution.
One key to success in refactoring is learning to pay attention to the numerous warning signs or smells that indicate a need to refactor.
Another key to refactoring success is learning numerous specific refactorings.
A final key to success is having a strategy for refactoring safely. Some refactoring approaches are better than others.
Refactoring during development is the best chance you'll get to improve your program, to make all the changes you'll wish you'd made the first time. Take advantage of these opportunities during development!