Wiley.com
Print this page Share

Design for Reliability: Information and Computer-Based Systems

ISBN: 978-0-470-60465-6
Hardcover
325 pages
October 2010, Wiley-IEEE Press
List Price: US $127.00
Government Price: US $87.64
Enter Quantity:   Buy
Design for Reliability: Information and Computer-Based Systems (0470604654) cover image
This is a Print-on-Demand title. It will be printed specifically to fill your order. Please allow an additional 10-15 days delivery time. The book is not returnable.

Figures.

Tables.

Preface.

Acknowledgements.

PART ONE RELIABILITY BASICS.

1 Reliability and Availability Concepts.

1.1 Reliability and Availability.

1.2 Faults, Errors and Failures.

1.3 Error Severity.

1.4 Failure Recovery.

1.5 Highly Available Systems.

1.6 Quantifying Availability.

1.7 Outage Attributability.

1.8 Hardware Reliability.

1.9 Software Reliability.

1.10 Problems.

1.11 For Further Study.

2 System Basics.

2.1 Hardware and Software.

2.2 External Entities.

2.3 System Management.

2.4 System Outages.

2.5 Service Quality.

2.6 Total Cost of Ownership.

2.7 Problems.

3 What Can Go Wrong.

3.1 Failures in the Real World.

3.2 Eight-Ingredient Framework.

3.3 Mapping Ingredients to Error Categories.

3.4 Applying Error Categories.

3.5 Error Category: Field Replaceable Unit (FRU) Hardware.

3.6 Error Category: Programming Errors.

3.7 Error Category: Data Error.

3.8 Error Category: Redundancy.

3.9 Error Category: System Power.

3.10 Error Category: Network.

3.11 Error Category: Application Protocol.

3.12 Error Category: Procedures.

3.13 Summary.

3.14 Problems.

3.15 For Further Study.

PART TWO RELIABILITY CONCEPTS.

4 Failure Containment and Redundancy.

4.1 Units of Design.

4.2 Failure Recovery Groups.

4.3 Redundancy.

4.4 Summary.

4.5 Problems.

4.6 For Further Study.

5 Robust Design Principles.

5.1 Robust Design Principles.

5.2 Robust Protocols.

5.3 Robust Concurrency Controls.

5.4 Overload Control.

5.5 Process, Resource and Throughput Monitoring.

5.6 Data Auditing.

5.7 Fault Correlation.

5.8 Failed Error Detection, Isolation or Recovery.

5.9 Geographic Redundancy.

5.10 Security, Availability and System Robustness.

5.11 Procedural Considerations.

5.12 Problems.

5.13 For Further Study.

6 Error Detection.

6.1 Detecting Field Replaceable Unit (FRU) Hardware Faults.

6.2 Detecting Programming and Data Faults.

6.3 Detecting Redundancy Failures.

6.4 Detecting Power Failures.

6.5 Detecting Networking Failures.

6.6 Detecting Application Protocol Failures.

6.7 Detecting Procedural Failures.

6.8 Problems.

For Further Study.

7 Analyzing and Modeling Reliability and Robustness.

7.1 Reliability Block Diagrams.

7.2 Qualitative Model of Redundancy.

7.3 Failure Mode and Effects Analysis.

7.4 Availability Modeling.

7.5 Planned Downtime.

7.6 Problems.

7.7 For Further Study.

PART THREE DESIGN FOR RELIABILITY.

8 Reliability Requirements.

8.1 Background.

8.2 Defining Service Outages.

8.3 Service Availability Requirements.

8.4 Detailed Service Availability Requirements.

8.5 Service Reliability Requirements.

8.6 Triangulating Reliability Requirements.

8.7 Problems.

9 Reliability Analysis.

9.1 Step 1: Enumerate Recoverable Modules.

9.2 Step 2: Construct Reliability Block Diagrams.

9.3 Step 3: Characterize Impact of Recovery.

9.4 Step 4: Characterize Impact of Procedures.

9.5 Step 5: Audit Adequacy of Automatic Failure Detection and Recovery.

9.6 Step 6: Consider Failures of Robustness Mechanisms.

9.7 Step 7: Prioritizing Gaps.

9.8 Reliability of Sourced Modules and Components.

9.9 Problems.

10 Reliability Budgeting and Modeling.

10.1 Downtime Categories.

10.2 Service Downtime Budget.

10.3 Availability Modeling.

10.4 Update Downtime Budget.

10.5 Robustness Latency Budgets.

10.6 Problems.

11 Robustness and Stability Testing.

11.1 Robustness Testing.

11.2 Context of Robustness Testing.

11.3 Factoring Robustness Testing.

11.4 Robustness Testing in the Development Process.

11.5 Robustness Testing Techniques.

11.6 Selecting Robustness Test Cases.

11.7 Analyzing Robustness Test Results.

11.8 Stability Testing.

11.9 Release Criteria.

11.10 Problems.

12 Closing the Loop.

12.1 Analyzing Field Outage Events.

12.2 Reliability Roadmapping.

12.3 Problems.

13 Design for Reliability Case Study.

13.1 System Context.

13.2 System Reliability Requirements.

13.3 Reliability Analysis.

13.4 Downtime Budgeting.

13.5 Availability Modeling.

13.6 Reliability Roadmap.

13.7 Robustness Testing.

13.8 Stability Testing.

13.9 Reliability Review.

13.10 Reliability Report.

13.11 Release Criteria.

13.12 Field Data Analysis.

14 Conclusion.

14.1 Overview of Design for Reliability.

14.2 Concluding Remarks.

14.3 Problems.

15 Appendix: Assessing Design for Reliability Diligence.

15.1 Assessment Methodology.

15.2 Reliability Requirements.

15.3 Reliability Analysis.

15.4 Reliability Modeling and Budgeting.

15.5 Robustness Testing.

15.6 Stability Testing.

15.7 Release Criteria.

15.8 Field Availability.

15.9 Reliability Roadmap.

15.10 Hardware Reliability.

Abbreviations.

References.

Photo Credits.

About the Author.

Index.

Related Titles

General Programming & Software Development

by Amnon H. Eden, J. Nicholson (Contributions by)
by Richard Mansfield
by Wallace B. McClure, Rory Blyth, Craig Dunn, Chris Hardy, Martin Bowling
Back to Top