Fault Forecasting

Fault Forecasting

Fault forecasting is the predictive approach to software reliability engineering. Forecasting is a front-end product development life cycle exercise. It is done during system exploration and requirements definition. Mature development organizations use fault forecasting as part of their front-end project/product evaluation process. The only way to have even a slight amount of accuracy in the predictive models is through access to appropriate historic data. Reliability models, historic data analysis, failure data collection, and operational environment profiling are key activities in this approach. Table 1 identifies the steps in fault forecasting.

Fault Forecasting Life Cycle Activities

The first step in fault forecasting is to determine the functional profile. By keeping track of the state of transitions from module to module and function to function we may learn exactly where a system is fragile. This information coupled with the functional profile will tell us just how reliable the system will be when we use it as specified. Programs make transitions from module to module as they execute. These transitions may be observed. Transitions to program modules that are fault-laden will result in an increased probability of failure. We can model these transitions as a stochastic process. Ultimately, by developing a mathematical description for the behavior of the software as it transitions from one module to another driven by the functionalities that it is performing, we can describe the reliability of the functionality. The software system is the sum of its functionalities. If we can know the reliability of the functionalities and how the system apportions its time among these functionalities, we can then know the reliability of the system.

The next step is to define and classify failures. As defined previously, software failures originate with design defects/errors (system and software), coding defects/errors, clerical mistakes, debugging inadequacies, and testing mistakes. The definition of failures involves the failure source. The classification of failures provides the severity level. A classification developed by Boris Beizer has an appropriate granularity:

1. Mild - Symptoms offend us aesthetically.
2. Moderate - Outputs are misleading or redundant.
3. Annoying - It causes dehumanizing system behavior (e.g., money machine refuses to cash your paycheck).
4. Very serious - Instead of losing your paycheck, the system credits it to another account.
5. Extreme - The problem is not limited to a few users.
6. Intolerable - Long-term, unrecoverable corruption of database occurs.
7. Catastrophic - The decision to shut down is taken out of our hands; the system fails.
8. Infectious - It corrupts other systems even though it does not fail itself.

Once the matrix of failure sources and classification is completed, the metrics of failure data should be tracked as in Table 2. In each cell of the table are the totals of the failures from historic information for similar projects and products.

Failure Sources and Class

Identifying customer reliability needs is the next step. These needs would have been previously identified and documented in the SRS, as described in Eliciting Requirements. The reliability needs for the customer must be stated in measurable terms. Although all requirements should be measureable and testable, reliability requirements absolutely, positively, must have numbers attached. The following are examples of adequate reliability requirement statements:

 1.  The launch system mission reliability shall be at least 0.999 at 95 percent confidence from launch commit to payload separation.

 2.  The satellite system mission reliability shall be at least 0.999 at 95 percent confidence for a period of 15 years from payload separation.

 3.  The avionics system will have 500 mean flight hours between critical failures.

 4.  A built-in self-test will detect an inoperable missile at a 60 percent probability.

 5.  A built-in self-test will mistake operable for inoperable at a less than 1 percent probability.

 6.  Mean-time to repair operational software is 30 minutes or less.

 7.  Mean-maximum-corrective-time is 60 minutes at the 90th percentile.

 8.  Built-in self-test maximum completion time is 20 seconds.

 9.  Mean-time to load full software and execute full internal self-test is 10 minutes.

10. Median-time to update one online page of documentation is 30 minutes.

Conduct trade-off studies is Step 4 in fault forecasting. Using the client functional profile and the failure classification information from previous systems, the specified requirements are analyzed to determine whether the historic data supports the goals. Trade-offs are analyzed to determine the probability of reaching the reliability goals specified in the requirements. If there is no historic data for like products and no analogous systems to analyze, the probability of reaching an artificially set reliability level is very low. At this point the project manager needs to develop extensive system models to determine the probable reliability ranges for the new product. Tools such as formal methods would be applied to the reliability requirements for a mathematical proof of the most critical subsystems. This is a very expensive process and should only be used when there is no source of historic reliability data.

Step 5 is to set reliability objectives based on the results of the trade-off studies. This final set of reliability objectives is fed back into the requirements process to modify the requirements. This step allows the information gathered and the analyses made to feed back into the requirements specification process. The reliability objectives/requirements will be used to validate the system reliability and gain user acceptance.


life cycle, software reliability, fault forecasting, software engineering
The contents available on this website are copyrighted by TechPlus unless otherwise indicated. All rights are reserved by TechPlus, and content may not be reproduced, published, or transferred in any form or by any means, except with the prior written permission of TechPlus.
Copyright 2018 SPMInfoBlog.
Designed by TechPlus