Tuesday, November 17, 2009

An Economic Analysis of Software Defect Removal Methods


Synopsis

This paper, based on my book Managing the “Black Hole”: The Executive’s Guide to Software Project Risk, explores the economic consequences of alternative strategies for software defect detection and correction during the software development life cycle. Most published analyses have relied on “cost per defect” as an index for comparison of alternatives. This metric has been shown to be fundamentally flawed as it fails to differentiate between fixed and variable costs and because it effectively penalizes high quality software. Cost per defect necessarily increases as incoming quality improves, regardless of defect detection strategies employed.

Most published papers on this topic include many unstated assumptions. This paper employs a set of parametric models that explicitly state the assumptions incorporated in the models. I invite readers to substitute alternative parameter values to explore consequences of different assumptions or experience. A hypothetical “case study” is presented to illustrate the impact of different assumptions. These models are available from the author on request, subject only to a good faith agreement to share assumptions and conclusions - if interested, email me (ggack@process-fusion.net). A more complete and detailed version of this post can be accessed at http://process-fusion.net/default.asp?id=6

Objectives

This paper and the models it uses are intended to help software organizations determine an optimal combination of defect containment methods and resource allocation. These models provide a mechanism to forecast the consequences of alternative assumptions and strategies in terms of both cost (in person months) and delivered quality (measured by “Total Containment Effectiveness” – TCE – i.e., the estimated percentage of defects removed before software release to customers).

Caution

As George Box said, “All models are wrong – some are useful.” This paper uses model parameters taken from a variety of public sources, but makes no claim that these parameter values are valid or correct in any particular situation. It is hoped the reader will take away a thought process and perhaps make use of this or similar models using parameter values appropriate and realistic in the intended context. It is widely recognized that all benchmark values are subject to wide (and typically unstated) variation. Many parameter values will change significantly as a function of project size, application domain and other factors.

Overview

The complete model includes 5 tables – the first 4 include user-supplied parameters and calculate certain fields based on those parameters. The fifth table summarizes the results of the other four. We will look at the summary here - the details upon which the summary is based are included in the complete version accessible as indicated above.

The summary charts, displayed graphically in the complete version and summarized in text below, include five scenarios all based on an assumed size of 1000 function points. The first three scenarios assume defects are “inserted” at US average rates according to Capers Jones Software Engineering Best Practices (2009, p.69) – a total of 5.00 defects per function point, including bad fixes and documentation errors. Scenarios four and five respectively reflect results reported by higher maturity groups in which defects inserted are reduce to 4 per function point in scenario 4 and 3 per function point in scenario 5.

Scenario 1 represents a “test only” scenario in which no “pre-test” appraisals, such as formal inspections, are used. Scenarios two and three introduce certain pre-test appraisals, including inspections and static analysis, and discontinue some test activities. Other model parameters, discussed later, remain constant across all scenarios.

Alternative Appraisal Strategies

Appraisal Type

Scenario

1

2

3

4

5

Requirements Inspection


30%


75%

75%

75%

Design Inspection


20%


50%

50%

50%

Code Inspection


10%

20%

20%

20%

Static Analysis



x

x

x

Unit Test

x

x




Function Test

x

x




Integration Test

x

x

x

x

x

System Test

x

x

x

x

x

Acceptance Test

x

x

x

x

x



Note that static analysis can only be used for certain procedural languages such as C and Java.

Two summaries are developed – the first shows the impact of alternative appraisal strategies on delivered quality as measured by “Total Containment Effectiveness”, i.e., the percentage of defects removed prior to delivery of the software. In this illustration a “best” mix of appraisal activities (scenarios 3-5) reduces delivered defects by about 75% compared to a test-only approach (scenario 1) typically used by average groups.

The second summary shows the impact of alternative appraisal strategies on “non-value-added” effort as defined in the Cost of Quality framework – i.e., all appraisal and rework effort is by definition “non-value-added” (NVA). Although certainly a “necessary evil” our goal will always be to minimize these costs.

In this illustration a “best” (scenario 3) mix of appraisal activities reduces total NVA effort (including both pre- and post-release effort) by 44% compared to a test-only approach typically used by average groups (66.6 person months in scenario 3 vs. 119.9 in scenario 1). More mature organizations, as a result of lower defect insertion, can reduce NVA by an additional 30% (to 46.3 in scenario 5 vs. 66.6 in scenario 3).