A Greedy Way to Get the Bugs OutLinley Erin HallSoftware bugs cost the United States economy $59.5 billion annually. Charles Colburn is developing new ways to find bugs based on the Greedy algorithm.bugsout.htmlEngineering and Technology: Computer Science

Related ASU Research Stories
The Human Factor (sidebar)

Related ASU Web Sites
Department of Computer Science and Engineering

Related Internet Sites
Greedy Algorithm (definition and explanation)

Publication Date: Spring 2005

A Greedy Way to Get the Bugs Out

Bugs. The ones that fly and crawl may be annoying, but the ones in computer software are real doozies. According to a 2002 study by the National Institute of Standards and Technology, software bugs cost the United States economy $59.5 billion annually. Software users pay more than half of those costs.

Programmers are unlikely to begin writing perfect code the first time, every time. They are only human, after all. As a result, efforts to reduce software bugs focus on improving the testing of software. Catching more bugs before a product goes on sale could save more than $20 billion.

Software engineers, however, don’t like testing because it doesn’t seem creative. They often only do enough tests to ensure that they have met the requirements outlined in their contract. And when a project is behind schedule and the marketing department is screaming for a product, the amount of testing may be cut even further.

“You test for the problems that you expect to find. It’s almost an oxymoron to say that you’re going to test for something that you didn’t anticipate. But, in a sense, that’s what you need to do,” says Charles Colbourn, professor of computer science and engineering at Arizona State University.

Colbourn is developing methods to find software bugs. “We want to develop methods with the speed and accuracy of mathematical techniques. But we also want generality and flexibility so that engineers can actually use them,” he says.

Most bugs result from different components in a system not working together properly. For example, consider an Internet-based piece of software with which people can use several different Web browsers, operating systems, types of network connections, and printer configurations. Each of these four components has three different possibilities. For example, the operating system can be Windows, Macintosh, or Linux.

Completely testing this system would require individually checking each combination of settings—81 in all. This may be reasonable, but most computer software contains many more components. Completely testing a system with 10 components, each with four potential settings, would require examining more than a million configurations. That’s generally not feasible in terms of cost or time.

However, researchers have found that software bugs usually depend on the interaction between two or possibly three components, not all of them. Colbourn says that if an engineer can put together a set of tests that contains each pair of components in every possible combination of states at least once—regardless of the state of the other components in the particular test—then most bugs will be discovered.


The figure above is called a finite geometry diagram. Colbourn converts numerical data into shapes like this to look for patterns. In this case, each of five nodes can take on four different values. Five points are connected in every case and every possible set of five is represented.

Of course, the number of bugs caught with this method depends on the particular software. A study of 109 software-controlled medical devices recalled by the Food and Drug Administration showed that pair wise interaction testing would have caught the problems in 106 of the devices. However, another study on fault reports for three non-medical software systems showed that pair wise interaction testing would catch 70 percent of the bugs. Three-way interaction testing would catch 90 percent.

Only looking at pair wise interactions speeds the software testing process considerably. For the case of 10 components, each with four potential settings, only testing pair wise interactions reduces the tests needed from more than a million to just 29. That’s quite helpful, but the engineer still needs to find those 29 tests that cover all the pairs.

Colbourn is developing methods to do this based on the Greedy algorithm. In the Greedy method, the engineer chooses one test to perform. The next test selected is the one that includes the largest number of pairs not included in the previous test. The process of greedily selecting tests that provide the most additional information continues until the test set contains all possible pair wise interactions.

Obviously, the first test in the sequence is important. So are other factors in order to determine how many tests the Greedy method requires to cover all possible pair wise interactions. The ASU researcher’s current work involves optimizing the method to yield the best results in the least amount of time. He also wants to make the method flexible enough to fit many different testing scenarios.

One way is to break down the problem into smaller chunks. In a more complicated system than the one with 10 components with four settings each, or a system that requires more than pair wise testing, an engineer may need to generate hundreds or even thousands of test cases. By partitioning the set into chunks of 25 to 30 test cases, the search problem simplifies considerably.

“We’re trying to get the best of both worlds,” Colbourn explains. “We’re trying to take some of the mathematical framework and make it usable by someone who wants to design a set of test cases but doesn’t necessarily have the restrictive kind of problem that the mathematics can handle well. We’re also trying to make it that you can take a lot of the real world constraints that come from things like requirement specifications and add that in to the specification of the mathematical problem.”

Colbourn and his team also look at statistical methods to find the tests needed to include all pair wise interactions, known collectively as a covering array. In statistics, people measure how things interact. In software testing, however, the engineer does not want to measure the things that together make a bug. He wants to locate them.

Colbourn and his colleagues examine how the software engineer’s covering arrays relate to the descriptions statisticians use for measuring. How good are the measuring descriptions at detection and vice versa?

The ASU researcher “sits firmly on the fence” between the areas of mathematics and software engineering. This can make it difficult for either group to recognize the contributions of his work.

“The software engineers say, ‘This is math, but it’s something I can use.’ Whereas the culture in mathematics values beauty and elegance over applications,” Colbourn says. “It’s easy to do stuff in either domain, but it’s difficult to do something that both sides find interesting.”

Colbourn’s work is slowly being recognized as mathematicians and software engineers find that their domains aren’t as dissimilar as they thought. —Linley Erin HallSoftware testing research at ASU is supported by the Army Research Office and the Consortium for Embedded and Inter-Networking Technologies. For more information, contact Charles Colbourn, Ph.D., Department of Computer Science and Engineering. Send e-mail to Charles.Colbourn@asu.eduEngineering and TechnologyComputer Science