Proving the Safety Integrity

Bjelica, Milan Z.

doi:10.1007/978-3-031-15823-0_10

Milan Z. Bjelica²

513 Accesses

Abstract

Formal proof of system safety is usually required before it can be signed off for deployment. A set of arguments (claims) used to prove the system safety is called a safety case. Safety case addresses all the safety integrity requirements defined by the respective standards and provides evidence that those requirements have been fulfilled. Many requirements include measurable indicators, some of which were discussed in previous chapters, such as reliability and failure rates. However, additional sets of measures may be prescribed by the standards, such as diagnostic coverage (DC), safe failure fraction (SFF), and more. This chapter discusses the required sets of claims for the safety case, including the description of those additional measures. Finally, the safety is contrasted with the availability, as one of the most important dependability requirements for the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

University of California, San Diego, San Diego, CA, USA
Milan Z. Bjelica

Authors

Milan Z. Bjelica
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Exercise 10

Continue the exercise from Chap. 9, now attempting to increase reliability by analyzing the failures of system components and attempting to assess dangerous undetectable failures. The initial pseudo-FMEDA sheet is given below:

A pseudo-F M E D A sheet for the functioning of sensors, logic and valves. It has failure mode, lambda per cent, class of failure and detectable features.

Additionally, try to respecify the detectability of failures for the diversified configuration of the system. Finally, calculate safe failure fraction (SFF) and diagnostic coverage (DC), and compare all safety integrity metrics against the requirements from the functional safety standard (see further table), considering that the required SIL for the described SRS is SIL 2:

Safe failure fraction of an element	Hardware fault tolerance
Safe failure fraction of an element	0	1	2
<60%	SIL 1	SIL 2	SIL 3
60% – <90%	SIL 2	SIL 3	SIL 4
90% – <99%	SIL 3	SIL 4	SIL 4
≥99%	SIL 3	SIL 4	SIL 4

MTTF_du ↓ DC→	DC < 60%	60 % ≤ DC < 90%	90 % ≤ DC ≤ 99%	99 % ≤ DC
3y ≤ MTTF_du < 10y	–	–	SIL 1	SIL 2
10y ≤ MTTF_du < 30y	–	SIL 1	SIL 2	SIL 3
30y ≤ MTTF_du ≤ 100y	SIL 1	SIL 2	SIL 3	SIL 4

^aValues are for practice only, not replicating actual standards

Your system, at the top level, is a single-channel system (hardware fault tolerance – HFT = 0). Finally, increase the item-level redundancy first to HFT = 1 and then HFT = 2 and rediscuss all the results.

Your Tasks for the Exercise

Update the calculation Excel sheet from Chap. 9 (feel free to use your previous solution OR the solution provided for Chap. 9).
Update the sheet by adding another failure rate row for dangerous undetectable failures (calculate them according to FMEDA) and update the calculations.
Calculate SFF and DC for each component.
Instead of MEM, now see if your SRS (which is SIL 2) complies with the provisions of the standard regarding SFF, DC, and failure rates.
Introduce item-level redundancy to increase HFT from 0 to 1 and then 2 (try to make the sheet configurable).
Rediscuss the results and reassess/make claims for all outcomes.
To close the safety case, what else shall be demonstrated for the SRS?

To-Do List

Perform the exercise with your peer group. One facilitator will perform the calculation.
Compare the results with another group when finished.

Exercise 10 Solution

Note: Solution is available as digital spreadsheet at sfs10.ex.nit-institute.com.

For each of the components from the exercise in Chap. 9, we now need to define portions of failure rates according to the FMEDA and then calculate SFF and DC and see how the values cope with the prescriptions of the standards.

Each component first needs to have its failure rate (lambda_all) decomposed to dangerous failures (lambda_d) and dangerous undetectable failures (lambda_du), expressed both as h⁻¹ and also as a percentage of the original failure rate. Then DC and SFF can be calculated for each component according to the formulae.

For the original and the final system MTTFs, values are expressed in years.

The logic component can be claimed to have a lower individual failure rate in case redundancy is applied. By having a redundant logic component, additional detection SW may be added to do oversight over the hot spare and to detect if it is not working due to the, e.g., bond wire detachment. We judge that the power supply problem due to the failure of PMIC can be detected likewise. Power supply problems due to other external failures are considered to be at the level of 40% of all power supply failure causes. This yields the final l_lambda_du% to be 0.4 * 0.24 = 0.096. This would make the final failure rate of the individual logic component in the redundant configuration to be λ_L = 8.256*10⁻⁷.

For the sensors, drift can be detected via majority voting – if there are not at least two inputs with the same value, then the value is incorrect. This yields the final s_lambda_du% to be 0.

After recompiling all the calculations, according to the requirements in the exercise (table for SIL determination based on SFF and DC values), we can determine that our system yields SIL 4 based on its SFF and the redundancy selection as HFT variable, and SIL 3 based on its DC and final MTTF.

After the adaptations, new reliability diagrams are obtained:

A graph of reliability versus time has 5 declining lines. The trend implies that the reliability of the original system and components decreases over time.

A graph of reliability versus time for an improved system. It has 3 declining lines. The voting line is constant at a reliability of 1.

Key Recap Questions

Think about using additional quantitative metrics for your system and its components:

Safe failure fraction (which failures are safe?)
Diagnostic coverages (which dangerous failures can be diagnosed?)
What about dangerous undetectable failures?
What about software?
What about availability?

Self-assessment

Now take the time to self-assess your knowledge by taking the quiz below. Each listed statement is either correct or incorrect. Please mark your answer and then check in the key at the end of the book.

1.
A safety case is a written demonstration of evidence for the safety integrity of random system failures, without considering systematic faults and process-dependent faults.
2.
It is not possible to close the safety case as long as there is at least one open item in the documentation artifacts that can be traced back to a hazard exhibiting unacceptable risk.
3.
Evaluating the safety functions against technical safety requirements and thereby demonstrating evidence of their correct operation in all possible situations (perfect coverage) is sufficient to declare the fulfillment of the respective safety requirements/safety goals.
4.
By introducing diagnosis in the SRS, it is possible to increase the reliability of the SRS.
5.
The failure rate with respect to all dangerous failures (λ_d) is always considered in the final system reliability evaluation.
6.
The higher the safe failure fraction, the lower the number of residual faults in the system.
7.
Diagnostic coverage tells us what is the percentage of failures that we can detect out of all failures which a component can exhibit.
8.
The system in the fail-safe state is actually the system in downtime.
9.
Availability requirements must be proven within the final system safety case.
10.
The way we write software may affect the probability of having systematic faults during the design and therefore residual faults which cannot be modeled nor detected during the system operation – therefore we must comply with the respective requirements with regard to software implementation, prescribed by the standard, according to the safety integrity level of the item we are develo**.

Self-assessment Key

1.
False
2.
True
3.
False
4.
True
5.
False
6.
True
7.
False
8.
True
9.
False
10.
True

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bjelica, M.Z. (2023). Proving the Safety Integrity. In: Systems, Functions and Safety. Springer, Cham. https://doi.org/10.1007/978-3-031-15823-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-15823-0_10
Published: 01 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15822-3
Online ISBN: 978-3-031-15823-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics