Using an IEC 61508-Certified RTOS Kernel for Safety-Critical Systems FTF China, August 2011 Bob Monkman Director, Business Development QNX Software Systems
The Standards The Standards IEC 61508 Accreditation and Auditing Bodies Derived Standards Certification The Plan The Practice Conclusion 2 2011 QNX Software Systems, GmbH & Co. KG,
IEC 61508 IEC 61508 Functional safety of electrical/electronic/programmable electronic safety-related systems First edition (1998-2000) Second edition (April 2010) significant additions, especially concerning software Summary Part 0: Functional Safety and IEC 61508 Part 1: General Requirements Part 2: System Requirements Part 3: Software Requirements Part 4: Definitions and Abbreviations Part 5: Examples of Methods Part 6: Guidelines for the application of Parts 2 and 3 Part 7: Overview of Techniques and Measures 3
Accreditation and Auditing Bodies A member of the International Accreditation Forum accredits A certification organization certifies A process or a product 4
Derived standards EN 5012n European railway standards EN 50126 reliability, availability, maintainability and safety EN 50128 communications, signalling and processing systems EN 50129 communications, signalling and processing systems (safety related electronics for signalling) IEC 62304 medical software and software life cycle processes ISO 26262 functional safety for road vehicles (in development) 5
The Certification Challenge The Standards The Plan Functional Safety System Safety Claim Safety Case and Supporting Evidence The Practice Conclusion 6 2011 QNX Software Systems, GmbH & Co. KG,
An Example of Functional Safety System A chainsaw 7
The Claims Context of the claims Probability of dangerous failure Level of dependability availability and reliability Sufficient dependability Functional Safety Requirements Safety Manual 8
The Claims: The Infamous Five-Nines Availability Failures per year Duration of each failure 1 5 minutes 16 seconds Potentially catastrophic 10 32 seconds 100 3.2 seconds 1000 316 milliseconds 10,000 32 milliseconds 100,000 3.2 milliseconds 1,000,000 316 microseconds Possibly benign Five-nines availability sounds good, but Would you fly in a plane with a flight control system that makes this claim, with no further precision? 9
The Evidence Pyramid 10
The Foundation - Quality management system Without these basic procedures, you can go no further Quality management system ISO 9000 ISO 15504 Capability Maturity Model Integration (CMMI) Source control Revision/version/source control Defect tracking Defects found by customers as well as through verification Defect classification (for fault analysis) 11
Design Artefacts Records from software life cycle Design documentation Project plan Quality plan Architectural design Detailed design Test plans Test results Other validation methods plans and results Traceability matrix 12
Static Analysis Syntax checking Check that coding standards are being applied Compiler is a syntax checker Checking with semantics knowledge Targeted module analysis Common fault scanning Assertion checking Symbolic execution Detect logical inconsistencies Pros: helps catch design errors early Cons: false positives 13
Proven-In-Use Data Particularly important for retrofitting In-field usage data are invaluable Build the gathering of this data into your business model The more in-use data available, the stronger the evidence In-use data only meaningful when scrutinized with fault analysis QNX used proven-in-use data to support its safety case for the QNX Neutrino RTOS Safe Kernel. 14
Fault Tree Analysis Structured analysis Easier for auditor Easier for audited Example: Bayesian Belief Networks tool for incorporating and providing quantitative results from Hard and soft evidence A priori (cause to effect) and a posteriori (effect to cause) evidence Fault tree 15
Design Verification Could be applied before or after design Powerful tool for retrofitting SPIN Simple Promela (Process Meta Language) Interpreter NuSMV New Symbolic Model Checker Less effective for retrofitting, but may be needed for SIL 4 Formal analysis For example: VDM (Vienna Development Method) Z 16
A Closer Look at Building Functional Safety The Standards The Plan The Practice Reason s Model Preventing the introduction of faults Preventing faults from causing errors Preventing errors from causing failures Minimizing the effect of failures Conclusion 17 2011 QNX Software Systems, GmbH & Co. KG,
Reason s Model Fault a mistake in the code, which may or may not cause undesired behaviour. Error undesired behaviour caused by a fault in the code. Failure a system failure caused by an uncontained error. 18
Preventing the Introduction of Faults (cont d) System engineering Formal languages VDM (Vienna Development Model) Z Notation Language choices Loose/Strong typing Dynamic/Static typing Exception handling Design techniques Test-driven design 19
Preventing Faults from Causing Errors Assertions Static code analysis Automatic code inspection Code inspections Fault injection Test fault detection and recovery Estimate number of Heisenbugs 20
Preventing Errors from Causing Failures Coherent exception handling Fundamental technique Throw the exception transfer control from point of exception another location where it can be handled appropriately Programming by contract Rejuvenation (or reset) Replication (redundancy/recovery) Consistency vs. performance and availability 21
Minimizing the Effects of Failures Architecture Microkernel Partitioning Fault Isolation Fault Detection & Recovery Clean crash Crash-Only Software Rapid restart may be required An simple elevator system with a failure. What techniques could we have used to find the fault? Is recovery possible? 22
Example: Adaptive Partitioning QNX Adaptive Partitioning Provides minimum CPU time guarantees to partitions (sets of processes or threads) Allows partitions to exceed their time budgets when spare processing cycles are available 23
How can QNX help? 24 QNX Certified Platform Architected for reliability and selfhealing IEC 61508 Certification Statement Safety Manual Device-specific Assurance Case report plug-in Neutrino RTOS Safety Assurance Case Proven in Use data Safe design training courses On-site audit (regulatory body participation possible) Subject Matter Expert consultancy time (hours)
To Summarize The Standards The Plan The Practice Conclusion 25 2011 QNX Software Systems, GmbH & Co. KG,
Summary Functional safety certification has no Short Cut Process and quality management are essential A proven OS architecture that ensures reliability/safety Gather in-field usage data Engage the auditor from the beginning and throughout the process Consider Pre-Audit Services Design and build for safety certification: Fault, error, failure, recovery 26
Thank you! Bob Monkman bmonkman@qnx.com www.qnx.com 27 2011 QNX Software Systems, GmbH & Co. KG,