Could the Bunce field Catastrophe Have Been Prevented Using an Intelligent Decision-Making System?

The work presented in this paper was to investigate whether a new intelligent decision-making system could have provided analysis using data sets and predicted the Buncefield UK catastrophe before it occurred. The new intelligent decision-making system is presented. It incorporates reliability engineering tools with multicriteria decision-making methods and artificial intelligence techniques. An intelligent system that recognises increasing level(s) and draws awareness to the possibility of additional increases before unsafe levels are reached is used to analyse and make critical decisions. The aim was to ensure that the causal factors of failure of the Buncefield UK incidents were predicted, ranked and solutions proffered one at a time to ensure that failures with high priority and high probability of re-occurrence were addressed.

Introduction * A defined system that could be used to avoid disaster or used during a disaster in the process industry was needed. Dynamic environments, such as the process industry, full of uncertainties, complexities, and ambiguities, demanded faster and more confident decisions [1]. Complex problems require a specific type of decision-making process. A new intelligent decision-making system that could help alleviate risks in process operations and safety engineering was created.
The Buncefield incident [2] was selected to demonstrate the identification of failures in systems before they occur and implement timely corrective decisions. The aim was to determine whether the new decision-making system would predict catastrophes using sets of past data.
Decisions about day to day operations are continually made in a process environment. Decision making can be considered as a process where alternatives are assessed to select a choice or a course of action to fulfil desired objectives and goals [3]. Intelligent methods have been used in a variety of process systems [4,5]. They have addressed more abstract issues and analyses, such as forecasting natural gas production in the United States * david.sanders@port.ac.uk [6,7] and decision making at a management level while dealing with incomplete evidence [8] to more-mundane technical issues concerning geoscientists and engineers such as drilling, [9] reservoir characterisation [10,11], production-engineering issues, [12,13]. Intelligent systems have been used to address many types of problems encountered in process industries.
The paper evaluates a new intelligent decisionmaking system that incorporated reliability engineering tools with multicriteria decision-making methods (MCDM) and artificial intelligence techniques. The aim was to see whether the new decision-making system could have predicted the Buncefield UK Incident.

Background
The new intelligent decision-making system shown in Figure 1  FD identified possible fault root causes, and FTA developed a fault propagation pathway to provide a quantitative probability importance ranking of fault causes. A risk matrix was used to rank and prioritise risk of events and decide whether the risks could be tolerated based on historically statistical data [14]. Use of a risk matrix provided risk levels that showed the cruciality of the basic events to the predictions from AHP. This indicated the significance of a basic event to the forecasts from AHP, providing a solution to the crucial causal factors to avert catastrophe. Risk assessment aided in identifying threats and hazards, recognising the cause and effect relationships, including exposure and weaknesses to risks, and describing potential risk. A new hybrid model was created by combining two decision methods: AHP and PROMETHEE [15].
The steps taken to implement the system were: Step One: Use FD to classify possible fault root causes and FTA to develop a fault propagation pathway and importance ranking of fault causes as described inIkwan et al. [16].
Step Two: Using the information on root causes from the FD, create a risk matrix. Determine the risk level for basic events using historical probability and impact values. Obtain real-time data for basic events and evaluate together with the risk level using a weighting factor as described inIkwan et al. [14].
Step Three: Calculate the criteria weights of intermediate events in AHP using a hierarchical model from the FTA and their probability values. Sequence results in PROMETHEE and evaluate each alternative using stored data of intermediate events (criteria) to predict risk. Feed results into the automated decision-making system as described inIkwan et al. [15].
Step Four: Combine priority numbers from the AHP with forecasted stored data for the intermediate event (criteria) predicted with time-series using an average weighting function.Feed results into the automated decision-making system.
Step Five: Create a rule-based matrix using conditional statements to make decisions about which to prioritise within the same cell. The inputs are results from steps four and five, and the output could be 'No Action Needed', 'Low Priority', 'Medium Priority', 'High Priority', or 'Urgent Action Needed'.
Step Six: Enact control measures based on the output of step six and risk levels of basic events. Once control measures have been implemented, the RTD would update, feed into theExcel algorithm, and display the system updates on the human-computer interface (HCI).

Leak in a storage tank
The leakage of hazardous substances in process industries has always posed a threat to employees and residents living near these industries; it has also resulted in significant environmental damage [17]. An industrial accident could result in substantial economic losses, several days of downtime, legal complaints, and stock devaluation. The problem "leak in a storage tank" was an undesirable event. The aim was to provide a systemic intelligent decision-making system that could analyse a complex decision problem, such as recognising a potential leak in a storage tank and deciding what to do about it. Systems that could lead to a leak in a storage tank were identified. A Fishbone diagram was used as a system identification method as described inIkwan et al. [16]. The FTA showed the relationship between basic events that lead to overall failure. At the top of the FTA was the unwanted event "leak in a storage tank", with several failures connected beneath until the basic events are reached. Basic events are the root causes that led to the overall failure under investigation. Figure 2 shows the FTA for the leak in a storage tank incident. The probabilities of basic events or undesired events were determined to calculate the risk of hazards. Twenty intermediate events (B), twenty-two basic events (X) and three secondary events (S) were defined (Table 1), and fault probabilities that led to the top event (T) were determined using data from research on fuel storage [18,19,20]. Table 1 illustrates the description of events that could lead to a leak in a storage tank.

Case Study: Bunce field Catastrophe
On Saturday, 10 December 2005, Tank 912 at the Hertfordshire Oil Storage Limited (part of the Bunce field oil storage depot) was filled with petrol. The tank had two forms of level control: an automatic tank gauge (ATG) that enabled employees to monitor the filling operation; and an independent high-level switch (IHLS), which automatically shut down operations if the tank was overfilled. The first gauge stuck, and the IHLS was inoperable. There was no means to alert the control room staff that the tank was filling to dangerous levels. Eventually, large quantities of petrol overflowed from the top of the tank. A vapour cloud formed, ignited, and caused a massive explosion and fire that lasted five days [2].
The Buncefield incident was selected to demonstrate the identification of failures in systems before they occur and implement timely corrective decisions.
The aim was to determine whether the new intelligent decision-making system would predict catastrophies. Realtime data values were entered into the system to emulate the information entered under normal conditions. Buncefield incident was tested using methods evaluated and described in Ikwan et al. [14,15,16].

A. Representation of identified traits in FTA
The new system was used to test whether the system would identify distinct traits. Five root causes identified were: i. Maintenance Error (X4) ii.
Failure to respond to Automatic Tank Gauge Alarm (X6) iv.
Inspection of flow rates (X10) This paper used the scoring for traits as 'one = Good' to 'nine = Extremely Dangerous' [14,21] Numbers greater than nine were also assumed to be "extremely dangerous". The award of Red, Amber, and Green (RAG) colours and the scoring shown in brackets equated to Red = Fail (7-9), Amber = Warning (4-6), Green = Good (0-3).
The six root causes that could have been triggered in the system are represented on the FTA model shown in Figure 3. Risk levels of basic events were calculated and evaluated as described inIkwan et al. [14] and shown in Figure 3. Figure 3 showed a steady increase in level(s) when the RTD was updated. A colour change illustrated the pattern showing the "States". The model indicated components with higher failures that required instant attention. This was demonstrated by the top event showing the highest-rated event. Intermediate events B5, B7, AND B11 gave numbers "7", "6", and "7", giving "warning" and "fail" state signals to the operator. The Top event (T) results showed the value "7", which meant a "fail" state on the system was imminent if corrective actions were not taken. To avoid reaching dangerous levels, the operator would see which basic events were leading to the failure. Figure 3 only shows the operator the current state of the Buncefield tank, meaning that the operator had to decide that levels might keep increasing. Fail 'states' of intermediate events from the FTA could not be predicted; therefore, intelligent methods needed to be applied to predict fail states. Predicting Fail states could aid evaluation of causes, safety risk, and timely control, and eliminate hidden dangers where necessary.

B. AHP AND PROMETHEE
Alternatives were sequenced in PROMETHEE using criteria weights calculated in AHP. Tank 3 was assumed to be Tank 912, the Buncefield tank. Visual PROMETHEE was used for analysis. The data of the model with criteria and alternatives are shown in Figure 4. Stored data from FTA was applied to PROMETHEE. The preference was set to "min" as "min" risk was required. For the analysis, a Usual function type was used as the preference function.
PROMETHEE II performed a full ranking combining positive and negative priority values. The resulting ranking of the negative, positive, and net flow (priority) values of the alternatives (Tank 1, Tank 2, Tank 3 (Tank 912), Tank 4) is ordered in Figure 5. "Tank 1" was the first order (least likely to develop a leak), "Tank 2" was second, "Tank 4" was the third-best alternatives, and "Tank 912" had the highest probability for a leak to occur. This ranking was important as it determined the alternatives with minimum and maximum risk. This meant that the new system would have flagged Tank 912 on the HCI, ensuring attention was given to the tank, which could have averted catastrophe.
The action profile in Figure 6 shows a disaggregated view of the strengths and weaknesses or the uni-criterion net flow scores for Tank 912. PROMETHEE II complete ranking results predicted that Tank 912 was most likely to leak. Figure 6 shows the criteria: maintenance culture, level indicator, and safety management systems on the negative axis, meaning these criteria should be the area of focus.
A geometrical GAIA plane showed the quality value for the analysis for the Buncefield catastrophe. It showed the dispersion of criteria depending on the values of the alternatives (tanks). In the GAIA plane, vectors represented the criteria, and squares represented the alternatives. The length of a vector of a specific criterion gave the effect of that criterion on the alternative. The quality value was calculated as 84%.

i. Bunce field -Maintenance Culture
Maintenance culture (Figure 7) forecasted that the next quarter would be '7.3' and would further increase to "7.7" and then "10" if no action was taken because the system identified maintenance reports were not being logged on the system. In this system as the trait increased in value, then the system would automatically escalate the warning.

ii. Buncefield -Tank Safety Management System
Safety management system (Figure 8) forecasted the next day would be a "6.5" and would further increase to an "8.8" and then "10" if no action were taken. The system knew that the ATG had failed because of the "flatline" warning. It would continue to increase because the operator did not respond by clicking on the pop-up on their screen within a set period; eventually, alarms would sound. The operator ignored the warning activated at Buncefield because warnings were often triggered. In this system as the trait increased in value, then the system would automatically escalate the warning, and a supervisor would be alerted.

iii. Bunce field -Level Indicator Failure
Level indicator failure (Figure 9) forecasted that the next day it would increase to a "7.1" and then an "8.7". The new system had other sensors that could alert operators if an overfill was imminent. The storage tank was connected with a safety instrumentation system (SIS) that connected the transmitter and the emergency shut down valve (ESDV). This SIS enabled automatic closure of the ESDV and sound when the fuel had reached the 'High High' level.

Figure 9. Bunce field -Level Indicator Failure
A weighting function (Equation 1) was used to allocate more weight to forecasted data for Tank 912 with respect to priority vectors obtained from AHP [16] because the aim was to predict future risk of failure. (1) Wi was the weighting factor, and Xi was the variables (real-time data for intermediate events/traits and priority ranking). The weighting function of real-time data was 0.6, and the Priority rank was 0.4. The priority ranking was evaluated and calculated as described inIkwan et al. [15].

D. Automated Decision-Making System
The results from the MCDM prediction were fed into an Automated decision-making system.
The input and output variables were: i. Predicted alternative (tank likely to leak) obtained from AHP and PROMETHEE.
ii. Value obtained from using a weighting function with priority ranking where PR<4 meant low priority ranking, 4<PR<7 meant medium priority ranking, PR> 7 meant high priority ranking and RTD of criteria (where RTD<4 meant low data source ranking, 4<RTD<7 meant medium data source ranking, RTD> 7 meant high data source ranking).
Value obtained using RTD of basic events where (RTDB<2 meant very low data source reading, 2<RTDB<4 meant low data source reading, 4<RTDB<7 meant medium data source reading, 7<RTDB<9 data source reading and RTDB >9 very high data source reading).
iv. The output variable which was the decision to take could be 'No Action Needed', 'Low Priority', 'Medium Priority', 'High Priority' or 'Urgent Action Needed'. A rule-based system was used to apply rules to all data being fed into the decision-making system. The representation used for IF-THEN rules is shown in equation 2 [23]. (2) where xi (i = 1, 2, . . ., n) are input variables and y is the output variable.
Here, A1, A2, . . ., An and B are the linguistic terms used for the input and output variables, respectively. The output generated by each rule were totaled into a single output.
Using equation 2, IF-THEN rules were used to depict the representation from the given four inputs to the single output. Since a single rule was not usually enough, two or more rules were needed that could play off one another [24].
In this paper, a set of 60 IF-THEN rules for each trait and 720 rules in total was created. They depicted all possible combinations of the given four input variables with five possible output terms (Table 3). One example rule was: "IF alternative predicted to fail is 'Tank 912', AND criteria is 'level indicator failure', AND combination of priority vector ranking, and RTD of criteria is 'PRT> 7', AND RTD of basic events is '7<RTDB<9', THEN Decision is "High Priority".
So, if the new system had been in place, then the operator would have seen the 'High Priority' on the HCI and known there was a fault.

E. Control Measures
All processes, including the risk levels, were evaluated to ensure that all basic events were tackled for Tank 912 and its contributing criteria. Prioritisation of risks allowed decision-makers to act on the most significant risk to facilitate appropriate resource allocations and avoid, eliminate, reduce, or control risk [23]. Control measures were assigned to causal factors of the Buncefield incident using the (ANSI/ASSP Z590.3) model described by Lyon et al. [25] and shown in Table 4. For example,

82/
D. Sanders, F. Ikwan,G.Tewkesbury IF alternative predicted to fail is 'Tank 912' AND criteria are "Level Indicator" AND combination of priority vector ranking and RTD of criteria is 'PRT> 7' AND RTD of basic events is '7<RTDB<9,' THEN Decision is 'High Priority'.
The control treatment strategy, in this case, was "Engineer". The corrective control measure was "operator should verify tank status as there are inconsistencies in the tank level". Table 4 showed that there were four "high priority" events. The operator would immediately see on the HCI that the ATG and Level indicator failure risk levels were "very high" and act promptly. The operator would decide which to attend to first based on the RTD of the basic event "9" and risk level. Therefore, the operator would attend to the ATG first before attending to other events.
The new system ensured that the causal factors of failure of the Buncefield incident were predicted, ranked and solutions proffered one at a time to ensure that failures with high priority and high probability of reoccurrence if left hidden, were given consideration first. This helped to ensure that no causal factors to failure were left unattended in the system, a situation that could create future re-occurrence.

Discussion and Conclusion
The aim was to determine whether a new intelligent decision-making system could provide analysis using data sets and predict the Buncefield UK catastrophe before it occurred. The new intelligent decision-making system incorporated reliability engineering tools with multicriteria decision-making methods and artificial intelligence techniques.
FD was used for system identification, and FTA showed how basic events interacted, leading to the overall failure. A model was also used for visual representation of relationships between hazards. It was impossible to update and integrate the total risk figures in response to changes in the actual real-world environment or subsequent improvements using Fishbone, FTA, and qualitative risk assessment methods; therefore, a dynamic risk assessment model was incorporated.
Dynamic risk assessment supported critical decision making by quantifying, aggregating, and understanding current risk when decisions were made. It aided in applying real-time data to intelligent decision making. Real-time data values were entered into a database to emulate the type of information that might be entered under normal conditions. The six root causes that could have been triggered in the system and led to the Buncefield incident were described, and their representation on the FTA model was discussed. This would have given the operator a real-time insight into the current state of Buncefield Tank 912, and warnings would have been escalated if they were ignored.
Although Dynamic risk assessment could show the current state of risk of basic events in the Buncefield incident, it did not predict future risk. It also could not provide prioritised and preventive measures for each basic event to eliminate their influences. Therefore, other methods such as AHP, PROMETHEE, time series forecasting, and rule-based methods were evaluated.
The new system ensured that the causal factors of failure of the Buncefield UK was predicted, ranked and solutions proffered one at a time. This helped to ensure that no causal factors to failure were left unattended in the system. The new intelligent decision-making system could provide analysis using data sets and predicted the Buncefield UK catastrophe before it occurred.
The new system will be applied to other case studies in the process industry such as the Texas Refinery [27] and CAPECO incident [28].