Reliability Enhancement of Computer Network System with Server Replication

Reliability is a performance factor applied to multi-computer network system consisting of devices such as active parallel or redundant hosts (clients), distributed database servers with replication, and a central server. For reliability evaluation and performance of such a system, this study analyzed a computer network system consisting of hosts (clients) connected to two distributed database servers in replication to each other. The system is configured as a series-parallel system consisting of two subsystems, namely A and B. Subsystem A consists of three clients in active parallel while subsystem B consists od two distributed database servers in replication to each order. Both clients and server failure and repair time are to be exponentially distributed The system is analyzed using first-order differential-difference equations to derive the expressions for the availability, mean time to failure, probability of a busy period of repairman due to partial or complete failure. The results are presented in tables and graphs. Reliability characteristics such as availability, MTTF, profit function as well as sensitivity analysis have been discussed. The computed results are demonstrated by tables and graphs. From the analysis of reliability characteristics, it is evident that availability, profit , and mean time to failure can be enhanced by adding more servers in replication to each other.


Introduction
Reliability is defined as the probability that a system performs its required function for a specified period of time under stipulated conditions while availability is the probability that a system is functional at a given period of time under a stipulated condition. Researchers in the past have presented excellent works on reliability analysis of computer systems in the form of reliability, availability, mean time to failure, cost, and performance assessment and proclaimed better performance of the repairable system by their operations.
[1] examined the performance of African Textile manufacturers using the * Corresponding Author Email: iyusuf.mth@buk.edu.ng copula linguistic.
[10] presented performance measures of a repairable complex system with two subsystems connected in a series configuration. [17] analyzed the performance of a complex repairable system with two subsystems in a series configuration with imperfect switch.
[10] analyzed the cost assessment of a complex repairable system consisting of two subsystems in the series configuration using Gumbel Hougaard family copula.
[18] presented the probabilistic assessment of two-unit parallel system with correlated lifetime. [23] analyzed the Cost -benefit analysis of three systems with imperfect coverage and standby switching failures.
[3] analyzed reliability of a redundant system subject to weather conditions using a first come first serve policy.
[16] discussed an approach for analyzing the reliability and profit of an industrial system based on the cost-free warranty policy.
[19] discussed reliability and availability of a parallel system under repair and replacement policy.
[22] discussed reliability and availability of standby systems with working vacations and retrial of failed components.
[25] analyzed reliability characteristics of a linear consecutive a 2-outof-4system connected to 2-out-of-4 supporting device for operation. [4] dealt with reliability analysis of acyclic transmission network based on minimal cuts using copula in repair. [26] dealt with a reliability assessment of a repairable system under online and offline preventive maintenance. Computer systems exhibit two types of failureshardware and software. Researchers, designers ,and engineers have suggested several techniques for improving the performance of the computer systems. The unit wise redundancy technique has been considered as one of these in the development of stochastic models for computer systems. Several techniques have been suggested by researchers, designers ,and engineers for performance improvement of the computer systems. The unit wise redundancy technique has been considered as one of these in the development of stochastic models for computer systems. The technique of unit wise redundancy in cold standby mode has also been used in computer systems. Interactions between clients and server components play a significant role in the successful operation of computer network. Existing literature has studied the interaction between clients/server or hardware/software in a computer network in which the failure of a server either trigger the failure of clients and vice versa or software failure trigger hardware failure and vice versa. Existing literature either has not captured the impact of server replication on network reliability or limited their work to the hardware-software failure analysis rather than reliability/availability enhancement of the network. [6,12,14] analyzed different computer system models with unit wise cold standby redundancy and different repair policies. But, it is also proved that component-wise redundancy is better than unit wise redundancy so far as reliability is concerned.
[15] developed a stochastic model for a computer system with a hardware components in cold standby redundancy.
[2] studied a cold standby computer system by giving priority to hardware repair activities over software replacement.
[13] analyzed computer systems with cold standby redundancy under different failures and repair policies. [5,7] have discussed modeling of a computer system with priority to preventive maintenance over software replacement and priority to hardware repair over replacement respectively. [8] have analyzed the performance of a computer system with fault detection of hardware.
[11] addressed the application of computer networks under a media environment.
[24] analyzed the reliability of computer network by using intelligent cloud computing method.
[27] presented the reliability of aero-engine compressor rotor system considering cruise characteristics. [28] presented research on reliability analysis of a computer networks based on intelligent cloud computing method Researchers in the past have presented excellent works on reliability analysis of a computer systems and proclaimed better performance of the repairable system by their operations. Still, a further study of the new type of models of computer system with server replication with a justified and satisfactory assessment is required. For this reason, this paper considered a computer system or network consisting of two servers both in replication to each other. The present paper considered a computer network consisting of three clients each connecter to two servers in which each server is a replica to the other server. The objectives of this paper are twofold. The First is to derive the corresponding reliability models of the system. The Second is to capture the effect of both failure and repair rates of clients and servers on the measures of system effectiveness like MTTF, availability, and profit. The structure of this paper is organized as follows. Section 2 presents the notation used in the paper. Section 3 is the description of the proposed computer network considered and its reliability block diagrams as well as the states of the system. Section 4 derives the reliability of the models. Sensitivity and numerical examples are presented in Section 5. Conclusions are given in Sections 6.

Description and states of the system
Replication is a situation in which copies of data are shared, stored and access at multiple servers to enhance reliability, availability, accessibility, fault-free system ,and high system performance. The optimal goal of replication is to see that data is stored, shared, and retrieve when the need arises.
A computer network with three clients connected to two servers is considered. It is assumed that clients are identical to each other and servers also are identical to each other. All clients are connected to each server as shown in Figure 1 below. However, server 2 is acting as a replica server. Each of the client and server fails independently of the state of the others and has an exponential failure distribution with parameters 0 E and 1 E with exponential repair time with parameters 0 D , and 1 D respectively.
From Table 1 provided in the appendix section, it can be seen that system operation is categorized into the following: First partial failure, second partial failure, third partial failure, and complete failure (Down).
The first partial failure is when the system experiences the first failure due to the client or server. This can be seen in states S 1 and S 2 . The second partial failure is when the system experiences the second failure, due to client and server of two clients. This can be seen in states S 3 and S 4 .
The third partial failure is when the system experiences the third failure due to two clients and one server. This can be seen in states S 5 .
The complete failure states are when the system is down as a result of the failure of the three clients, three clients and one server, two servers, one client and two servers, two clients and two servers. This can be seen in states S 6 to S 10 .
From the above division of system operation, it can be seen that there exists failure interaction in the system's operation stages. The first partial failure induces the second partial failure which in turn induces the third partial failure. The complete failure states are induced by first partial failure, second partial failure, and the third partial failure.

Reliability Models Formulation
To analyzed the availability, profit function ,and mean time to failure of the system, let p t i is defined to be the probability that the system at ‫ݐ‬ Ͳ is in state ܵ ሺ‫ݐ‬ሻ.
Also let , , ..., 1 2 10 P t p t p t p t be the row vector of these probabilities at time t .The initial condition for this problem is 1, 0 (0) 0, 1, 2, 3, ...,10 The corresponding set of differential-differenceequations obtained from Table 2in To compute the availability of the system, the differential-difference equation given in (2) are expressed in the form. Where Let T denote the time-to-failure of the system, the steady -state availability (the proportion of the time the system is functioning or equivalently the sum of the probabilities of operational state), busy period due to failure of client and server is given by In steady-state, the derivatives of the state probabilities become zero, thus (3) became Using the following normalizing condition Using (7) and (9) to give Solving (9) The explicit expressions for the steady-state availability, busy period due to failure of the transmitter, relay stations ,and receiver are given by (4) to (6) 2 2 2 2 2 2 2 3 2 4 4 8 4 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 2 3 2 2 2 2 3 2 4 2 2 4 2 1 1 1 1 1 0 The client and server are subjected to corrective maintenance due to failure, hence the repairman is busy performing maintenance action to the failed items. Let 0 K , 1 K and 2 K be the revenue generated when the system is in the working state and no income when in the failed state, cost of each repair due to failure of client and server respectively. The expected total profit of the system per unit time incurred in the steady-state is To compute the meantime to failure of the system, the procedure requires deleting rows and columns of absorbing states of matrix T and take the transpose to produce a new matrix, M as adopted in [20,21]. The expected time to reach an absorbing state is obtained from (see appendix for M ) 1 0 1,1,1,1,1,1 T MTTF P Q The expression is too lengthy to be shown here. Where

Results and Discussion
In this section, numerical simulations of availability, MTTF, and profit function for the developed models considering for two system parameters are provided. For each model, the following set of parameters values are fixed throughout the simulations for consistency: Tables 3 and 4 displayed the impact of repair and failure rates of the subsystems on availability, profit and mean time to failure respectively. It is evident from these Tables that availability, profit and mean time to failure increases as repair rates increase and decrease with an increase in failure availability, profit rates. The variation in availability, profit and mean time to failure corresponding to different failure (repair) rates indicates that incremental change in values of parameter decreases (increases) the availability, profit and mean time to failure of the system. From the results above, it is worthwhile to note that high system reliability, availability, mean time to failure as well as mobilization can be achieved by adding more active replica servers. Thus, servers have a vital effect on higher reliability, availability, mean time to failure ,and revenue generation than clients. This sensitivity analysis implies that preventive and major maintenance should be invoked to the system to minimize the failure to maximize the system availability, profit ,and mean time to failure. On the other hand, availability, profit and mean time to failure are the higher (lower) with higher (lower) value of repair rates and lower (higher) value of failure rates in Tables 5-8. This sensitivity analysis implies that major maintenance to the subsystems/units should be invoked to lower the failure rates, improve and maximize the system availability as well as production output.

Conclusion
In this paper, we constructed a repairable computer network system consisting of three clients all connected to two servers (one in replication). The explicit expressions for the system reliability characteristics such as system availability, busy period of repairman due to corrective maintenance, profit function as well as mean time to failure have been obtained and validated by performing numerical experiments. Analysis of the effect of failure and repair rates based on Tables and surface plots on availability, profit and mean time to Mean time to failure failure was performed. By categorizing the system operation into the first partial failure, the second partial failure, the third partial failure ,and the complete failure, this study will give an insight in reducing system downtime due to clients or servers failure or both which will lead to unnecessary queuing for repair or replacement of the failed clients, loss of data as well as delay in response to a client request to servers by the client through the addition of more servers in replication. These are the main contribution of the paper. The network configuration considered in this study can be implemented in the areas such as education during CBT examinations, banks, communication ,and military sections. The study highlights that employing more servers in replication will enhance the reliability characteristics of the system significantly in comparison to a single server network. The effect of major links in the network and dependence of the entire network on such links in the network can be further investigated.    Zhao, B., Xie, L., Li, H., Zhang, S., Wang, B., and Li, C. (2020). Reliability analysis of aero-engine compressor rotor system considering cruise characteristics. IEEE Trans Reliab 68 (4)