Computation of Importance Measures Using Bayesian Networks for the Reliability and Safety of Complex Systems

Modern engineering systems have proven to be quite complex due to the involvement of uncertainties and a number of dependencies among the system components. Shortcoming in the inclusion of such complex features results in the wrong assessment of reliability and safety of the system, ultimately to the incorrect engineering decisions. In this paper, the usefulness of Bayesian Networks (BNs) for achieving improved modeling and reliability and risk analysis is investigated. The calculation of a number of Importance Measures with use of Fault Tree Analysis as well as BNs is provided for a complicated railway operation problem. The BNs based safety risk model is investigated in terms of quantitative reliability and safety analysis as well as for multi dependencies and uncertainty modeling.


Introduction
Importance Measures (IMs) may assist the system designers in the recognition of the components requiring improvement, helping the maintenance engineers for improving the maintenance strategies regarding the demanding components and expedite the decision makers regarding discharge of the engineering finances for the safety mechanization. There are a lot of Importance and criticality evaluation measures which are effective in various reliability and safety risk problems [1,2,3,4]. For example, Risk Achievement Worth (RAW) recognizes the system risks increment in the case a specific component downfall in system has taken place. An increment in the occurrence possibility of the downfall of component will result into the increase of Fussel Vesely (FUV) Value.
Event Tree Analysis (ETA) and Fault Tree Analysis (FTA) are the common methods applied to logically represent an engineering system such as a railway system, for the reliability and risk analyses [5,6,7,8]. Generally, both FTA and ETA simplify the calculations by considering logically deterministic combinations of causes. Due to this reason, there exist shortcomings in modeling of the complex systems [9,10]. An exponential increase in the structure of FTA is seen in majority of the cases, that is why it becomes very difficult to understand and compute with the increase in common reasons of failure as well as due to multistate events [7]. Due to these shortcomings, it becomes difficult to apply the traditional methods for analyzing complex engineering systems such as the railway system, which is featured by a number of dependencies and uncertainties. Therefore, we require an investigation to use BNs in order to model and analyze the risks and reliability in current railway system. This new method of BNs can handle the complex features of risk and reliability problem in the likes of common cause failures, disjoint events, functional uncertainty, multistate components, failure dependency, time dependence and expert and factual knowledge. During the recent years, BNs have gained good attention and are being utilized for engineering reliability and risk problems [12,13,14,15,16,17]. Joint distributions of plenty of random variables can be handled efficiently in the directed acyclic network by the BNs, which are the probabilistic graphical models [18,19]. Examples of BNs applications to the railway industry are few in number. For example, in [20] using BNs to illustrate a parameterized FTA for Signal Passing At Danger (SPAD); in [21] BNs model has been developed to recognize and classify the bugs in rail system based upon the sensor data; in [22] a BNs perspective has been proposed in order to model the inventive relationships for subway systems amidst the risk factors.The accuracy and certainty problems can be resolved by using combinatorial methods using static fault tree analysis [32]. As a result of some useful studies, it is believed that Bayesian Networks analysis method is one of the most efficient and appropriate one for reliability evaluations of the systems [33].Although, at the moment there exist no research available for modeling risk and reliability within the complex railway systems. This has been characterized by many different modern features which will be described in the section 5 of this paper. Moreover, the calculation of IMs for such complex systems like railway, by using BNs is yet to be explored until now. Therefore, unique and innovative work with respect to calculation of IMs as well as modeling and analysis of such complex features for the railway systems with the use of BNs is present in this article.

Definitions of the IMs
In this paper, risk and reliability related IMs have been discussed, that have already appeared in literature [1,2,24,25,26,27,28,29,30,31]. In the definitions in table 1, failure probability of the system is represented with ሺ ୱ ሻ which is calculated as the function of componentsሺ ୧ ሻ's probabilityሺ ୧ ሻ.

Techniques for reliability and risk analysis Fault Tree Analysis (FTA)
The FTA is the top-down approach in which a tree structure is used for finding logical combinations of the reasons of Top event (Te). System analysis is done during the context of the environmental conditions and safety and functional requirements. All those combinations of the basic events which lead towards the occurrence of Te are identified. Basic assumptions associated with standard FTA include: (1) Events in the FTA are assumed to represent random variables consisting of two binary states, either occurring or not occurring (2) Basic events are assumed to be statistically independent. In Figure 1, a cause and consequence based Safety Risk Model (SRM) is presented. The causes of train derailment are modelled using FTA while consequences are modelled with the help of ETA. In Figure 1, lower part of Te is FTA, also discussed in [7]. Possible conditions for the occurrence of top event Te are shown in FTA. For instance, an intermediate event of SPAD will take place when the train is moving towards a red signal and also there are: (1) simultaneous failures of TPWS and Driver errors (2) slippage due to inadequate adhesion between rail and wheels. The Te of Train Derailment can take place when two conditions follow the SPAD, which are: (1) a Turnout/point with blocked route (2) presence of a curve in the track alignment (CTA) in addition to the High Train Speed (HTRS). It is considered that the driver is unaware of the slippery track conditions; hence he cannot care for the aspects related to slip during the brake application.  Generally, Probability of the Te, ሺሻ in FTA is calculated as a function of minimal or least cut sets with the use of inclusion and exclusion principle, In equation above, ሺ‫ܥ‬ ሻ indicates probability of the happening of least cut set ݅in a FTA and number of least cut sets is denoted by n. To calculate the probability of Te in Figure. 1 as a function of the Probabilities of least cut-sets: One can calculate the IMs by using equation (2) in Table 1. Here are shown only the calculations of all. IMs for basic event 1 denoted by݁ ଵ . Improvement Potential (IMP):To calculate the improvement potential for the basic event 1, ሺୣ భ ሻ is multiplied with theሺ݁ ଵ ሻ. So, we get ሺୣ భ ሻ ൌ ʹǤͺͺ ൈ ͳͲ ିସ Ǥ Criticality Importance Factor (CIF): For the calculation of CIF for event 1, all the values are available. So, we get CIF for the basic event as 1.
Fussell-Vesely Measure (FUV): We are concerned with minimal cut sets which involve a particular event in the standard FUV. Hence, Failure importance is calculated by considering the contribution of event ݁ to overall failure of the system. The failure importance measure of FUV is calculated to be 1 for the basic event݁ .

Event Tree Analysis (ETA)
The ETA is a bottom-up method which is used to develop and analyze event situations that can arise from the Te, also known as the initiating event in the ETA and result into several potential consequences. The ETA shown here is adopted from [6] in which train derailment accidents for UK railways are analyzed. This work extends the ETA for train derailment with the introduction of Safety Integrity Levels (SIL), neutralizing factors and barriers for the different consequences. Classification of accidents is done on the basis of their severity levels. Such classification is compulsory in order to differentiate among the fatal, significant and insignificant accidents. Please refer to Table 4. Calculation of the IMs by using BNs: In order to respond to the joint and marginal probabilities of random variables in the BNs related questions, we put together standard Bayesian inference by using the variable elimination algorithm. Until this, BNs shown in Figure. 4 give the identical probability of Te and values of IMs as shown in Table 3.  Figure 1 Quantification of the risk Next step involves the calculation of the numerical values of Individual Risk of Fatality (IRF). It is expressed with regard to annual fatality rate of someone (person) who gets vulnerable to a given condition at the given point of time. It is calculated as: x = Times an individual gets vulnerable to the hazards of the system x = Number of the hazards/risks x ୨ = Rate for ݆ ௧ hazard/risk(top event in the FTA) x ୨ = Time duration of the hazard ݆ x ୨ = Vulnerability Time of an individual regarding the hazard݆ x ൌ Total number of accidents Parameters of risk reduction ‫ܥ(‬ is the factor of risk reduction for the ݇ ௧ accident due to the ݆ ௧ hazard and ‫ܨ‬ is probability of the fatality in ୲୦ accident). Factor of the risk reduction is calculatedfrom consequence models, i.e., Event Tree for the train derailment consequences. Fatalities are most specific aspects to look for in the railway risks, hence; in Table 6, just factor of risk reduction ‫ܥ‬ for those accidents which correspond to the severity levels 2, 3and 4 (ͲǤͲʹͺ ͲǤͲͳͲ ͲǤͲ͵ ൌ ͲǤͳͲͺ) are given thought for IRF. SIL 0 and SIL 1 are the categories which are averted during calculation because they do not result in the human fatalities apparently. Table 6. Risk reduction factors calculated from models in Figure. 1 and Figure 5 Class of severity SIL Following are the extra numerical values for IRF: ൌ ʹǤͺ͵ ൈ ͳͲ ିସ from FTA, ൌ ͲͲ times/year. Usually, an individual makes use of the train twice each day and 300 days a year), ൌ ͳ (one hazard or Top event (Te), ୨ = 5 hours (average maintenance or negating time for the hazard situations because of failure), ୨ = 0.05 hour (time to observe and cross a red signal and an overlap length), ୨ ୩ ൌ ͲǤͳͲͺ and ୩ ൌ ͲǤͲͳ. Numeric value of IRF is ͻǤʹͺ ൈ ͳͲ ିସ . Until then, numeric valueof IRF is equal to risk models in Figure 4 and Figure 1as they are comparable.

Complex aspects of the engineering problem
Complex aspect # 01: Common causes of failure The FTA in Figure 1 takes for granted that basic events are statistically independent. It is not true. Occurrence of slip requires a high speed of the train. Hence, HTRS is a mutual Cause, also known as the common cause failure. Not paying attention to such common causes results in two types of risks, which are either (1) Overestimated in case of dominance by the series (OR gate) components, or (2) Underestimated in case FTA having a large number of components in the parallel (AND gate).

Complex aspect # 02: Disjoint events
Basic event of Slip and the intermediate event of the Driver Errors and Protection Failures cannot take place together, because prior application of brakes is required for slip to occur. These are mutually exclusive or disjoint events and hence, are statistically not independent.

Complex aspect # 03: Multistate system and components
Events of the standard FTA correlate with the random variables having binary states i.e. fail/success. It is not possible to directly model the mutually exclusive system states or multistate components using FTA. For example, for derailment of the train due to SPAD, we need to differentiate two different states of the system or situations. Situation 1: SPAD takes place because of the slip effectbrought about by the poor adhesion. It implies that while passing a red signal, brakes are applied. In this condition, Top event of the train derailment will only take place if distance between turnout point in overlap length and the signal is adequately small. Else, train will stop before turnout point. Train derailment because of the curvature in track is minor considering the prior train speed limitation due to the brake application. Situation 2: SPAD takes place because of not applying the brakes, like the occurrence ofintermediate event of Driver errors and protection failures. In this condition, top event takes place independent of the overlap length because of: (1) a turnout in subsequent section with blocked route (2) a curve in subsequent section. It means that in addition, two basic events are required for the modeling of multistate system (See Table 7).

Complex aspect #05: Dependencies of the Time
Time dependent event is present there in the FTA. The probability of the driver to commit errors enhances over time, specifically when driver needs to carry out longer than routine duty hours. It means, probability of ݁ ସ changes with time, which also affects probability of the Te in time.

Complex aspect# 06: Uncertainty about the Function and factual knowledge
Uncertainty of the failure emerges in case of track section, involving both turnout/point and curve in track alignment not set. Tendency for derailment will be increased in case train enters within this section, after SPAD has taken place. Furthermore, the actions of replacement, repair and maintenance which have been taken in past inform that no overlap length exists having both Turnout/point and Curve in track alignment. Hence, failure logic OR must be replaced by XOR for the Conditions for derailment (see Table 8).

Complex aspect #07: Uncertainty in the expert knowledge
In the absence of enough historical data for the risks quantification, estimation of the occurrence probability of some events is done by consulting the experts in field.
At times, experts disagree on probability of the occurrence of an event. For instance, two experts have dissimilar opinion on probability of the CTA which will result into the Te.
Complex aspect #08: Dependencies and Simplifications in the ETA The ETA shown in Figure 1 is used to simplify the event scenarios resulting into the consequences, hence it is unable to include several barriers and the neutralizing factors. According to the system characteristics, some barriers and neutralizing factors might exist in ETA which are evenly valid for FTA. Also, dependencies between the barriers and neutralizing factors are not given thought here.
Some of the aforementioned complex aspects addressed above can be applied by using the complex FTA techniques [10]. By incorporating three complex aspects: Disjoint events, multistate system and common causes into the model, structure of FTA explodes and becomes non-intuitive as shown in Figure. 5. Also, by introducing a new common cause, structure of the FTA may become different. Quantitative analysis of the FTA becomes computationally challenging and needs the help of computerized techniques for its evaluation.
For instance, because of the repetition of gates in Figure 5, common causes have enhanced up to six. Thus, ʹ common cause event spaces are required for the computation of probability of Te. Thereby, Total probability theorem will then be used to compute the probability of Te: Above and other constraints are avoidable using BNs.

Implementation of complex aspects of train derailment model by using BNs
In BNs, we can directly introduce the common causes with the addition of relevant links, without duplicating the nodes. We consider the common cause of HTRS with the introduction of the link from HTRS to SA and Slip. Disjoint events are directed to be modeled with the addition of a link among relevant random variables and afterwards consequently placing values in conditional probability table of child node. For instance, events of driver errors and TPWS failure and slip are exclusive with each other, as elaborated in the section 5. A link is introduced from node Drive errors and TPWS failure, and probability of the slip is set to zero, given Driver errors and TPWS failure (in Table 9, compare column 2 and 3). We can directly represent the multistate system with the introduction of the relevant random variablesituationsin BNs. Table 10 shows a Conditional Probability Table (CPT) for two conditions. CPT connected with child nodes can also be used to manage the failure dependency between system components i.e. random variable in the BNs. For instance, increased tendency of happening of Driver errors and TPWS failure is shown in column 3 of the Table 11,when the event of TPWS failure takes place before driver errors. Formerly, it was used as an AND gate in FT model presented in Figure 1.  The temporal node named Driver error towards the brake application (DE) is establishedwith the conditional probability table, which develops over time and is used to model the transition probability. Period of transition is supposed to be about 10 minutes which is equal to the 10 th order Markov chain in BNs in the Figure 6. Here, ሺ ୲ିଵ ሻ ൌ ͳ and ቀ ୲ିଵ ቁ ൌ ͲǤͲͲͷ. In order to model the functional uncertainty, conditional probability valueሺ ‫פ‬ ǡ Ƭ݈ܽ݅݃݊݉݁݊‫ݐ‬ǡ ‫ݐݑ݊ݎݑܶ‬ ‫ݐ݊݅‬ ‫ݐ݊‬ ‫‪ሻ‬ݐ݁ݏ‬ ൌ ͲǤͳ is allocated to a table connected with node Train derailment. Modeling of the factual information by using BNs is unequivocal, by connecting a conditional probability table with node Derailment conditions byusing XORlogic gate as shown in the Table 8. A node named as Expert knowledge is initiated in BNs and probabilities of the CTA subject to states of the node are described. Probabilities of the CTA provided Expert 1 and Expert 2 are 0.1 and 0.2, accordingly. Furthermore, we can include the reliabilities of two experts on their decision. For instance, someone thinks the expert 1 as more reliable than expert 2, hence, assigns their respective probabilities as per 0.55 and 0.45. The resultant BNs after the application of complex aspects is illustrated in Figure 6. IMs values related to the reliability and risk of a number of events in BNs shown in Figure. 6 have got updated. Diagnostic analysis or backward updating which provides important information regarding most probable purpose of a specific Te i.e.ሺ݁ ଵ ‫פ‬ ݁ ሻ cannot be done with the FTA. Apart from the fact that BNs offered the calculation of IMs for a complex system model, the extra edge while using the BNs was the complex modeling of the joint distribution of random variables which resulted in the brief visualization of the reliability and risk problem. BNs can update the probabilities, nominal beliefs of all the random variables in BNs, through bidirectional (backward and forward) transmission of the evidence through whole network. This bidirectional updating helps the BNs to tackle several Te in the same model.

Conclusions
A number of complex aspects and their consequences on quantitative reliability and safety analysis of complex engineering system from the field of railways were considered. The probabilities of Te and IRF per year were reduced by considering complex aspects of the railway operations. The BNs with complex aspects resulted in lower values of Importance Measures and fatality risks. The application of complex aspects using BNs was possible and an improved calculation of the importance measures for complicated system was achieved. It is concluded that system risks were overestimated by safety models in the absence of the complex aspects, which were complicated to model using Fault Tree and Event Tree based risk models.