Tuesday, November 17, 2009

An Economic Analysis of Software Defect Removal Methods


Synopsis

This paper, based on my book Managing the “Black Hole”: The Executive’s Guide to Software Project Risk, explores the economic consequences of alternative strategies for software defect detection and correction during the software development life cycle. Most published analyses have relied on “cost per defect” as an index for comparison of alternatives. This metric has been shown to be fundamentally flawed as it fails to differentiate between fixed and variable costs and because it effectively penalizes high quality software. Cost per defect necessarily increases as incoming quality improves, regardless of defect detection strategies employed.

Most published papers on this topic include many unstated assumptions. This paper employs a set of parametric models that explicitly state the assumptions incorporated in the models. I invite readers to substitute alternative parameter values to explore consequences of different assumptions or experience. A hypothetical “case study” is presented to illustrate the impact of different assumptions. These models are available from the author on request, subject only to a good faith agreement to share assumptions and conclusions - if interested, email me (ggack@process-fusion.net). A more complete and detailed version of this post can be accessed at http://process-fusion.net/default.asp?id=6

Objectives

This paper and the models it uses are intended to help software organizations determine an optimal combination of defect containment methods and resource allocation. These models provide a mechanism to forecast the consequences of alternative assumptions and strategies in terms of both cost (in person months) and delivered quality (measured by “Total Containment Effectiveness” – TCE – i.e., the estimated percentage of defects removed before software release to customers).

Caution

As George Box said, “All models are wrong – some are useful.” This paper uses model parameters taken from a variety of public sources, but makes no claim that these parameter values are valid or correct in any particular situation. It is hoped the reader will take away a thought process and perhaps make use of this or similar models using parameter values appropriate and realistic in the intended context. It is widely recognized that all benchmark values are subject to wide (and typically unstated) variation. Many parameter values will change significantly as a function of project size, application domain and other factors.

Overview

The complete model includes 5 tables – the first 4 include user-supplied parameters and calculate certain fields based on those parameters. The fifth table summarizes the results of the other four. We will look at the summary here - the details upon which the summary is based are included in the complete version accessible as indicated above.

The summary charts, displayed graphically in the complete version and summarized in text below, include five scenarios all based on an assumed size of 1000 function points. The first three scenarios assume defects are “inserted” at US average rates according to Capers Jones Software Engineering Best Practices (2009, p.69) – a total of 5.00 defects per function point, including bad fixes and documentation errors. Scenarios four and five respectively reflect results reported by higher maturity groups in which defects inserted are reduce to 4 per function point in scenario 4 and 3 per function point in scenario 5.

Scenario 1 represents a “test only” scenario in which no “pre-test” appraisals, such as formal inspections, are used. Scenarios two and three introduce certain pre-test appraisals, including inspections and static analysis, and discontinue some test activities. Other model parameters, discussed later, remain constant across all scenarios.

Alternative Appraisal Strategies

Appraisal Type

Scenario

1

2

3

4

5

Requirements Inspection


30%


75%

75%

75%

Design Inspection


20%


50%

50%

50%

Code Inspection


10%

20%

20%

20%

Static Analysis



x

x

x

Unit Test

x

x




Function Test

x

x




Integration Test

x

x

x

x

x

System Test

x

x

x

x

x

Acceptance Test

x

x

x

x

x



Note that static analysis can only be used for certain procedural languages such as C and Java.

Two summaries are developed – the first shows the impact of alternative appraisal strategies on delivered quality as measured by “Total Containment Effectiveness”, i.e., the percentage of defects removed prior to delivery of the software. In this illustration a “best” mix of appraisal activities (scenarios 3-5) reduces delivered defects by about 75% compared to a test-only approach (scenario 1) typically used by average groups.

The second summary shows the impact of alternative appraisal strategies on “non-value-added” effort as defined in the Cost of Quality framework – i.e., all appraisal and rework effort is by definition “non-value-added” (NVA). Although certainly a “necessary evil” our goal will always be to minimize these costs.

In this illustration a “best” (scenario 3) mix of appraisal activities reduces total NVA effort (including both pre- and post-release effort) by 44% compared to a test-only approach typically used by average groups (66.6 person months in scenario 3 vs. 119.9 in scenario 1). More mature organizations, as a result of lower defect insertion, can reduce NVA by an additional 30% (to 46.3 in scenario 5 vs. 66.6 in scenario 3).

Friday, June 5, 2009

A Blast from the Past - Scoping and Estimating with Context Diagrams



A technique developed "in the old days", Context Diagrams, can often help us develop a plausible first approximation of project scope before requirements are really defined. It's easy to learn, quick to apply, and works with any flavor of development, including both traditional and Agile approaches.

No rocket science here, just practical common sense. A Context Diagram, illustrated here in the general case, is simply a list of "objects" or elements that are typically included in a software system. This particular format originated back in the days of stone tools, perhaps it was the Mesozoic era, sometimes referred to as the epoch of "structured analysis". Truly amazing what those cave people were able to accomplish! But I diverge ...

This was, and is, a useful quick way to scope out a new project. Let's take a simple example. Suppose for some reason our fearless leader decides we're going to build a new accounts receivable system. Le Grand Fromage wants to know by Tuesday how long it's going to take and what it will cost.

Step 1: Assemble a small group of individuals who have at least a fuzzy idea what an AR system entails - e.g., the billing supervisor, the head of collections, a rep from the CFO's office, and the BA who deals with the order to cash process.

Step 2: A facilitator (perhaps the BA) draws a big box on the white board and asks the group "What's inside the AR box, what's not?" Billing speaks up "We're using quill pen and abacus now - we really think we ought to be inside." Collections pipes in "If we're going to include billing it will greatly increase the scope - is that really what M. le Brie intended?" Already the tribes are forming - who's going to get voted off the island first? We have a choice to make here - do we go back to the big cheese right now, or do we continue and develop several alternative scope estimates - e.g., with billing and without.

Step 3: Whatever we decide is (at least provisionally) inside the big box, our next step it to start describing the various elements that define "it". For most business and administrative type systems there are five broad categories of "objects" that will account for the vast majority of the effort and cost associated with building a system - specifically, Inputs, Outputs, data sets (tables, files), Interfaces to other systems (both one-time and on-going), and Queries. We now make a list of each item in each category. For our hypothetical AR system inputs might include New Customer, Merge Customers, Delete Customer, Update Customer, Invoice, Credit Memo, Credit Report, etc. A list of several hundred items can easily be identified in one day or less.

Notice we're not trying to define the content or format of any of these items, and we're not exploring business rules, workflow aspects, or any of the other myriad details that will need to be uncovered before we can actually build this. Of course we'll need to do that later, but just this list, which in a great many cases can be pretty well completed in a day or two, really tells us a lot about what we're getting into.

Step 4: Convert our list to an estimate of Size (in "function points"). Here we can use a simplified approach I call "function points lite", also known as "count on average". This approach will give you perhaps 80% of the accuracy of full-blown function points at a small fraction of the cost.

For those who may not be familiar with Function Points (FP), they are the subject of an ANSI and ISO standard developed by the Internal Function Point User Group (IFPUG). This is a method to quantify the "size" of a software system, and size has been shown to be a primary driver of cost and schedule. Many estimating methods and tools use Function Points as a primary input.

Each of the five entity types mentioned above has been assigned a low, average, and high function point value based primarily on the number of data elements making up the entity. (The 'official' counting method is a good bit more involved than we'll go into here - references on this abound.) You can get a very good approximation of the total count very quickly by simply assuming every element is "average" for its' type. Taking our example a step further might lead us to ...

7 Inputs * 4 = 28 FP
9 Data sets * 10 = 90 FP
6 Queries * 4 = 24 FP
12 Outputs * 5 = 60 FP
8 Interfaces * 7 = 56 FP
=================
Total = 258 Function Points

This is then the "size" of our baseline scope - obviously as items are added or removed as we go along the scope can easily be adjusted accordingly - AND we're adjusting in a way that is very easy for the customer to understand.

Step 5: Schedule and Cost Estimates - We can convert this size value into estimates of schedule and cost by many different means - e.g., all of the best selling estimating tools will take our FP count (and other factors) as inputs and give us the results. Alternately we can use simpler 'rule of thumb' methods such as those suggested by Capers Jones (note: Capers would readily agree these are much less reliable than tool based approaches that account for many more factors) - e.g.,

project duration (months) = FP raised to the .4 power = 258**.4 = 9.2 months

project staffing = FP / 150 = 258 / 150 = 1.7 staff (full time equivalent)

project effort (person months) = schedule months * staff = 9.2 * 1.7 = 15.6 person months

Step 6: Evaluate Risk - unlikely we would do so for a small project such as our illustration, but if it's a big and nasty 'bet your job' estimate we can go a step further and apply a technique such as Monte Carlo simulation to evaluate a range of outcomes and probabilities. It's beyond our scope to get into that here, but if you've got a risky situation I'll be most pleased to consult with you on how to manage it.

Once we have this list we can get back to his or her Cheesiness to provide the requested estimates and get some initial decisions on what will and won't be included.

Friday, May 29, 2009

Who's First - The CMMI Chicken or the Six Sigma Egg?

This post is abstracted from “Connecting Software Industry Standards & Best Practices: Lean Six Sigma and CMMI®”

Crosstalk Feb.2007 by Gary Gack and Karl Williams

Software professionals, especially those working in the Department of Defense environment, face a somewhat bewildering array of relevant standards and best practices. As awareness and penetration of Lean Six Sigma in this environment have increased significantly over the last several years, we find many organizations struggling to understand and leverage the relationships between Lean Six Sigma and several other approaches to software process improvement, including CMMI®. In our view, the cases described below and other industry experience answer in the affirmative four questions we hear quite frequently:

• We are already doing Six Sigma; does it make sense to do CMMI® as well? Clearly each of the cases below felt there was additional value to be gained from engaging CMMI® in addition to Six Sigma.
• We are probably CMMI® (staged) level 2 or 3; does it make sense to do Six Sigma before we get to level 4? None of these organizations had reached level 4, but all had realized benefits from Six Sigma at lower levels.
• Management wants us to get to level 5 as soon as possible; can Six Sigma help us get there quicker, or will it slow us down? Case 3 below, as well as other results reported in the literature, clearly demonstrates Six Sigma can reduce the time needed to move to higher maturity levels.
• We are already doing CMMI® ; does it make sense to do Six Sigma as well? In her article “Using Six Sigma in Software Development” in News@SEI Lauren Heinz wrote the following:
“Each year at Lockheed Martin, corporate management challenges its Integrated Systems & Solutions business unit (IS&S) to reduce total costs. Each year, IS&S uses Six Sigma tools to make it happen. Six Sigma has resulted in significant cost savings, said Lynn Penn, director of quality systems and process management at IS&S. It’s a structured approach that provides more than a checklist—it shows you what’s coming next, lets you look at data from different views, and gives you a big picture of your practices for making decisions.
Lockheed Martin is part of a growing number of organizations using Six Sigma to improve software quality and cycle time, reduce defects in products and services, and increase customer satisfaction. As Six Sigma evolves from an improvement framework for the manufacturing sector to one that can be applied across all levels of an enterprise, the SEI is looking at ways that Six Sigma has benefited software and systems development.”


Answering a fifth common question necessarily moves into the realm of opinion – the following is ours:
• We are just getting started on process improvement; should we do Six Sigma, CMMI®, or both? At the same time, or one or the other first? – Our experience convinces us that most organizations will get measurable business results more quickly with Six Sigma than with CMMI®, but ultimately will need the additional insights available from CMMI® to realize maximum benefit. Government contracting organizations will in most cases need to do both in parallel - CMMI® to satisfy government bid requirements and Six Sigma to ensure tight and near-term linkage to measurable business results. Even when CMMI® comes first, level 4 and 5 will invariably lead organizations to something that is very like Six Sigma, even if by another name.

Lean Six Sigma places primary emphasis on understanding and managing performance (outcomes) while CMMI® (often in practice if not in principle) places more emphasis on maturity/capability level. Many within the SEI have expressed concern about over-emphasis on the rating alone. Certainly level rating is important for government contracting organizations, but is not sufficient by itself to quantitatively demonstrate improved outcomes in terms of cost, quality, or cycle time.

CASE STUDIES

Williams provided CMMI® training and performed Class C Appraisals in three different organizations that were well into Six Sigma deployment. Black Belts, Green Belts, and Champions had been trained and all staff had some form of Six Sigma orientation. The results across the board were significantly different than the average initial appraisal. A significant number of processes were already documented and used, as opposed to the usual blank stares when procedures / templates are requested. All three had results more typical of a second or third round of appraisals than the normal results.

Especially noticeable was the difference in the quantitatively based Process Areas (PA), e.g. Measurement and Analysis (MA) at Staged Level 2 and the Level 4 and 5 Process Areas covering Quantitative Process Management (QPM) and Organizational Process Performance (OPP). In most initial Class C Appraisals, we do not even bother to look at Levels 4 – 5, sometimes not even Level 3. In all three cases, plans called for reviewing only Levels 2 and 3. As the appraisals progressed, the results were so startling that the Levels 4 – 5 were also reviewed.

Case 1 Results
• Measurement and Analysis Process Area – All goals and practices compliant with no improvement opportunities.
• Level 4 Process Areas (2) – All but two goals satisfied. All practices largely compliant with three improvement opportunities.
• Level 5 Process Areas (2) – All but two goals satisfied. All practices largely / partially compliant with two improvement opportunities.
• Follow-up accomplishments – The Group completed a Level 2 Appraisal as planned for calendar-year objectives. The Level 3 Appraisal was successful nine months later. Plans were halted when the organization was acquired by a large multi-national organization.

Case 2 Results
• Measurement and Analysis Process Area – All goals and practices compliant with one improvement opportunity.
• Level 4 Process Areas (2) – All but two goals satisfied. All practices largely / partially compliant with five improvement opportunities.
• Level 5 Process Areas (2) – All but one goal satisfied. All practices largely compliant with two improvement opportunities.
• Follow-up accomplishments – The Group was appraised at Level 3 three months later and is planning for a Level 5 Appraisal this quarter.

Case 3 Results

• Measurement and Analysis Process Area – All goals satisfied and practices compliant with one minor improvement opportunity.
• Level 4 Process Areas (2) – All but one goal satisfied. All practices largely compliant with three improvement opportunities.
• Level 5 Process Areas (2) – All goals and practices satisfied.
• Follow-up accomplishments – Group was appraised at Level 3 one month later and Level 5 six months following that event

All three case study organizations shared other attributes as well, including the following:
• All processes and procedures contained measurements, root cause analysis, and continuous improvement follow-up.
• All organizational results exceeded the data published by the Software Engineering Institute benchmarking activities.
• All levels of management and staff used the measurements on a daily basis.

The above results are phenomenal for first time Class C Appraisals. Upon further investigation, all stakeholders agreed that the Six Sigma tools and Six Sigma mindset had contributed greatly to such unparalleled results.

Wednesday, April 15, 2009

Improving Requirements with Lean Six Sigma Tools

Lean Six Sigma (LSS), more than anything else, is about Managing by Fact. Every organization can select elements of the LSS approach without necessarily taking on a full implementation. This post considers one common scenario in software implementations and describes how selected elements from LSS can be adapted to improve outcomes.


Scenario:


“We do an enhancement prioritization process with our customers during our annual planning cycle, but somehow it just doesn’t seem to work very well. We end up with a bunch of stuff that doesn't seem to have any sort of overall theme – almost features without a rationale. We need another way to work this!”

Anchoring Requirements in Business Outcomes


One of the common misconceptions concerning Lean Six Sigma (LSS) is that it’s all about statistics – in reality it’s much more than that. One of the tools (which can be applied independent of the method in total) involves disciplined use of language data as well as numbers. Requirements, after all, mostly get described in language, and typically not very precise language at that. We often, as in this case example, end up with a laundry list of features and functions whose coherence and central themes are often very unclear, even to those who may have given us the requirements. We've all had the experience of getting half way through a project and realizing that both the development team and the customer are wondering "now why is it we're doing this particular feature???"

When financially oriented fact-based thought process is applied to requirements the focus is often quite different than the typical “what are your requirements?” approach. Instead the focus is on understanding the customer's "Critical to Quality" (CTQ) business objectives – developing a rich understand of what the customer is trying to accomplish and how value will actually be generated. At first glance this may seem a fine distinction, but in practice it leads to a very different mindset that creates very different outcomes. Implications of this different mindset include:


Desired business outcomes precede features and functions.
A fact-based approach will focus first on the business results, described in financial terms that are the reason a system is being developed or enhanced. Certainly most projects begin with some sort of high-level statement of business objectives that justify initiation of the project, but that focus is often lost when the team starts to “find out what they want” – by the time the project is a couple of months old few remember the initial rationale. Most projects quickly lose sight of the “why are we doing this?” and "how does this feature/function contribute to realization of the expected business value?" point of view – losing the connection between “what they asked for” and how satisfying those wishes will produce business value.


An accounts receivable system, to offer a simple example, fundamentally has only one reason to exist – i.e., to facilitate collection of money owed the organization. Even in such a simple example systems are very often built that have dozens of different ways to enter transactions or view amounts outstanding, reflecting the individual preferences of collections and accounting personnel in the various divisions and regions of the organization. Perhaps many of these units were acquired over a period of time. Perhaps they all used different systems and different business processes – some used Oracle, some SAP, some had QuickBooks. They all want to have their reports and screens exactly the way they are accustomed to seeing them – and as a consequence the implementation team builds far more software than is fundamentally necessary, creates many versions of the training, provides help desk support for all the variants. A very large part of this extra effort has essentially no actual business value.

Impact based selection of functionality to be delivered. Instead of "popularity contest" that relies on some sort of voting scheme and/or on the political or financial clout of certain stakeholders, a more fact-based approach will produce better results. A scorecard based on an adaptation of the Pugh method appropriate to the circumstances can provide a formal mechanism that facilitates objective evaluation. Proposed features and functions can be rated against an agreed set of CTQ attributes that reflect not only the business outcomes but also important non-functional attributes of a solution that meets all "well-founded" customer requirements (as distinguished from wishes and matters of taste or style). Attributes that may be rated for each proposed feature/function might include some of the following:
  • The contribution it makes to financially measured outcomes – if we add this feature will our collections improve?
  • The contribution it makes to the cost of operating the system – if we add this feature, will it reduce our operating cost or cost of ownership?
  • The contribution it makes the efficiency of the personnel using the system – if we add this feature will it reduce the time it takes to enter transactions?
  • The time it takes to perform a collection activity?
  • The contribution it makes to deployment of the system – if we add this feature, will it reduce training time? Reduce development of training materials?
  • The contribution it makes to system reliability – does it make the system more foolproof? Is the cost of the feature consistent with the associated failure risk?
  • The contribution it makes to security – does the feature make the system less vulnerable? Is the cost of the feature consistent with the associated risk?
  • What portion of system users will use the feature – is the user base impacted consistent with the development costs?

Questions such as these (and certainly there could be many others relevant to a particular situation) can be an effective screen that prevents gumming up the works with a lot of low-value stuff. When ratings have been assigned to proposed features and costs have been estimated for each it is a relatively simple matter to use the resulting scores as the basis for decisions on what to include in light of the available budget. A software firm I know applied this approach to a major new release – when they presented their approach and results they received a standing ovation from the user group members who participated in the identification of potential features/functions and in the ratings process. They saw a significant increase in upgrade revenues, and for the first time in years the internal friction between development and marketing was reduced to a low boil.

Is this rocket science? Of course not! Did we need advanced statistics? No way! What was needed, and what some of the LSS tools supplied, was a disciplined, fact-based process that was visible, understandable, and defensible. Certainly there was room for argument on the ratings, and many arguments occurred, but in the end, everyone involved understood how and why the decisions were reached. Internal and external alignment was better than it had been in years.

Tuesday, April 7, 2009

Is Agile "Fragile"?

While I'm not intending to be unduly controversial (well, maybe a little), I have noticed more and more commentary recently expressing various concerns about a current "hot topic" - Agile methods. One example is a recent article by James Shore, "The Decline and Fall of Agile".

In that article he remarks "It's odd to talk about the decline and fall of the agile movement, especially now that it's so popular, but I actually think the agile movement has been in decline for several years now. I've seen a shift in my business over the last few years. In the beginning, people would call me to help them introduce Agile, and I would sell them a complete package that included agile planning, cross-functional teams, and agile engineering practices. Now many people who call me already have Agile in place (they say), but they're struggling. They're having trouble meeting their iteration commitments, they're experiencing a lot of technical debt, and testing takes too long."

Personally I don't doubt there are many potential benefits of Agile methods, provided they are actually used as intended and are appropriate to the context in which they are applied. Sadly, like many other good ideas, Agile is often more "talk the talk" than "walk the talk". Some of Agile's more rabid advocates seem think its a "universal solvent", which even alchemists and sorcerers don't believe any more - nothing turns lead into gold.

On the other hand, I do have some fundamental concerns about the evident lack of hard facts and data - there seems to be a lot of heat, but not much light. Are Agile methods actually more productive in aggregate across a series of iterations compared to alternative methods? As Shore points out, "technical debt" can easily become a major problem. To some extent short iterations are necessarily risky to architectural soundness. Of course Agilists advocate "refactoring" to remedy that risk, but how often is refactoring actually done? What does it actually cost? After 10 or so iterations is Agile really, in total, more productive than another alternative?

And what about test driven design/development? What does it cost compared to Fagan style inspections? What are the actual defect containment rates? Capers Jones data and other sources clearly show Fagan inspections find 60- 80% of the defects present while testing finds 30-50% (per test type). The facts we do have call into question some of the claims made for TDD.

In fairness, my comments about lack of facts and data are by no means restricted to Agile - they apply to a great many fads du jour. Let's hope one day soon we'll begin to do rigorous data based assessments. Actually, a few of us are working behind the scenes to bring that to fruition - more about that as it develops!

If you have any facts and data (vs. anecdotes) that may shed light on this topic, please share!