Software reliability has been listed as a key quality measure for as long as software engineering has been defined. Figure 1 represents the quality factor topology as presented by McCall, Richards, and Walters in their 1977 work. These factors are discussed in this blog. Particular attention is paid to software reliability because of its high visibility to the end-user. Product revision and transition quality factors are important to the software development and maintenance teams. Product operation factors are customer-facing and cause the most customer pain when they are deficient.

Software Quality Factors

Using the IEEE Standard Glossary of Software Engineering Terminology, software reliability is defined as the ability of a system or component to perform its required functions under stated conditions for a specified period of time. Software reliability is also an important factor affecting system reliability. It differs from hardware reliability in that it reflects the design perfection of the product, rather than manufacturing perfection. The high complexity of software is the major contributing factor of software reliability problems. Software reliability is not a function of time, as the industrial and manufacturing reliability modelers show with their traditional bathtub curves as in Figure 2. Measurement of software reliability is still in its infancy, but without adequate measurement the data does not exist to execute the extensive statistical models required of real reliability analysis. No good quantitative methods have been developed to represent software reliability without excessive limitations. Various approaches can be used to improve the reliability of software, however, it is hard to balance development time and budget with the perceived high cost of software reliability.

Hardware Reliability Bathtub Curve

Unlike hardware, software doesn't "wear out," but is delivered "broken". Determining and planning for the "bugs" that will inevitably be released in the software product are discussed in "Validation and Verification", "Continuous Process Improvement" and "Software Quality Assurance". Figure 3 shows this representation of the notional software bathtub curve. The defects never spike as much as for hardware, but they never go to zero. The project manager must understand how much to invest in reliability. The methods for determining how much is enough were discussed in "Determining Project Risks". Risk and reliability must be addressed together. Cost-effective application of the principles of software reliability engineering are directed by what risks reliability engineering mitigate. If the use of reliability is not a mitigating risk, there is no reasonable purpose for incurring the additional project cost.

Software Reliability Bathtub Curve

The majority of software reliability problems are not life threatening like the Therac-25 or the Patriot Missile problems. The majority of reliability problems that affect the greatest number of people are in the area of scoring standardized tests. "The testing industry is coming off its three most problem-plagued years. Its missteps have affected millions of students who took standardized proficiency tests in at least 20 states". The impact of a hard-to-find software error can have lasting effects on students, teachers, and administrators when standardized tests are involved. It is not only the test scoring that is at risk, but the more subjective calculations of equating - the process that allows test scores to be compared year after year.

As it turned out, CTB - despite its assurances to Indiana and others - had done an incomplete job of reviewing test data. When a much larger sample was reviewed, a programming error surfaced.

The error had - erroneously - made the current test appear easier than the previous year's. To make the tests equal in difficulty, the computer had then compensated by making it harder for some students to do as well as they had last time. The error did not change students' right and wrong answers, but it did affect their comparative percentile scores.

This lack of software reliability caused administrators to be fired, 40,000 students to have to go to summer school, hundreds of thousands of dollars earmarked for education improvement to be misspent and hundreds of students not promoted to the next grade-level because of perceived low reading skills.

This section will focus on four approaches to achieving highly reliable software:

1. Fault forecasting - reliability models, historic data analysis, failure data collection, operational environment profiling
2. Fault prevention - formal methods, software reuse, construction tools
3. Fault removal - formal inspections, verification, and validation
4. Fault tolerance - monitoring techniques, decision verification, redundancy, exception

Where We Are in the Product Development Life Cycle

Software reliability must be planned for in the initial project phases establishing the project environment and planning the project management activities. The process of determining the reliability of the software under construction requires large amounts of data to be gathered from the project metrics system. "Software Metrics" describes the activities of metrics collection and analysis. Metrics are generated throughout the life cycle by all of the engineering activities. The four software reliability approaches are executed across the life cycle phases as shown by the heavy lines in Figure 4. Fault forecasting occurs through system exploration and requirements. Fault prevention occurs through requirements, design, and implementation. Fault removal occurs through design, implementation, and installation. Fault tolerance begins at implementation and extends through final product retirement.

Software Reliability in the Product Development Life Cycle

Table 1 is a further mapping of the four reliability approaches to the life cycle phase activities. These activities will be discussed in detail in each approach subsection of this section.

Software Reliability Approaches Mapped to Life Cycle Phase Activities

Reliability Relation to the 34 Competencies

The following three of the 34 competencies described in this blog apply especially to software reliability:

Product Development Techniques

1. Assessing processes - Specifically assesses the reliability goals of the risk management plan and the processes within the product development life cycle. This adds to the overall organization's approach to continuous process improvement.

Project Management Skills

16. Managing risk - The only reason to invest in the high cost of software reliability processes is to mitigate risk.

19. Metrics selection - The use of the reliability tools described in this section require a well-designed and reliable set of project, process, and product metrics.

Learning Objectives for Reliability

Upon completion of this section, the reader should be able to:

●  Apply basic software reliability concepts;
●  Define the benefits of software reliability engineering for specific projects;
●  Define the specific data needed to use valid statistical techniques in software reliability engineering;
●  Incorporate reliability with an overall organization and project metrics program;
●  Calculate expected failures given historic error information;
●  Select tools for statistical analysis of reliability data;
●  Use the operational profile in calculating reliability.

Software Reliability Terminology

Software engineering terminology has become very much like Alice in Lewis Carroll's Through the Looking Glass:

"When I use a word", Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean - neither more nor less".

"The question is", said Alice, "whether you can make words mean so many different things".

Software engineers and project managers have used and misused the terms for fault, error, problem, and others. With respect to software reliability, the following terms with definitions will be used. They are standard and accepted throughout the industry.

Defect is a problem found in a later phase or process than when it was introduced.

Error is a problem found in the current phase or process.

Fail-safe is the property of avoiding damage during a failure.

Fault-tolerant is the property of being able to recover from certain errors and keep operating.

Problem is a deviation from specifications or expected results.

Process error is an incorrect output of a process and is, therefore, a resulting incorrect state or condition.

Process failure is an event whereby a faulty resource used by the process produces an error in its output, which is eventually observed.

Process fault resides in the resources used in a process and is viewed as an input to a process. It represents an incorrect state or condition of the system to which the process belongs.

Robustness is the property of being tolerant of bad inputs.

Software failures originate with design defects/errors (system and software), coding defects/errors, clerical mistakes, debugging inadequacies, and testing mistakes.


software reliability, software engineering, topology, software metrics, life cycle
The contents available on this website are copyrighted by TechPlus unless otherwise indicated. All rights are reserved by TechPlus, and content may not be reproduced, published, or transferred in any form or by any means, except with the prior written permission of TechPlus.
Copyright 2018 SPMInfoBlog.
Designed by TechPlus