Abstract
This paper deals with multi-component machine repair model having provision of warm standby units and repair facility consisting of two heterogeneous servers (primary and secondary) to provide repair to the failed units. The failure of operating and standby units may occur individually or due to some common cause. The primary server may fail partially following full failure whereas secondary server faces complete failure only. The life times of servers and operating/standby units and their repair times follow exponential distribution. The successive over relaxation (SOR) technique has been used to obtain the steady state queue size distribution of the number of failed units in the system. To explore the system characteristics, various performance indices such as expected number of failed units in the queue, throughput, etc. have been obtained. Numerical results have been provided to illustrate the computational tractability of the proposed SOR technique. To examine the effect of system descriptors on the performance indices, the sensitivity analysis is also performed.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
With the advancement in technology, the use of automated machines in many areas of practical utility has become common. The machines have entered every nook and corner of our life; thus there is dependence of every one on them. An interruption due to machine failure not only affects the quality of the service facilitated by the machines, but also increases the cost of operation of machining system. The machine interference is one of the key problems in many industries such as manufacturing systems, communication systems, computer systems, transportation, etc. The interference in normal functioning occurs when a machine stops and will not resume its operation until it is attended by the repairman. When a repairman finds more machines to repair than his capacity at a time, the problem of machine interference arises. Due to cost and technical constraints, the trade-off between the repairman staffing level and the magnitude of machine interference has become an important issue and has drawn the attention of many queue theorists who considered the machine interference problem as finite source queueing model. In recent past, the contributions of Jain (1997), Yang et al. (2005), Ke and Lin (2008), Jain et al. (2008) and many more are worth-noting in this regard. The survey on the machine interference problems was done by Haque and Armstrong (2007) and Jain et al. (2010). Wang et al. (2013) performed a comparative analysis of the machine repair problem with imperfect coverage and service pressure condition.
The loss of production while the broken-down machines are under attention of repairman can be reduced to some extent by providing spare part support. Based on failure characteristic, the spare units can be categorized into three types (1) cold (2) warm and (3) hot. While not in use, the cold spare units do not fail, whereas the failure of warm (hot) spare units is less than (equal to) the failure rate of operating units. Several papers have appeared in queueing and reliability literature which explored various aspects of machine repair problems with standby in different contexts. Sivazlian and Wang (1989), Wang (1995), Wang and Kuo (2000) and Wang and Chang (2002) did the cost and probabilistic analysis of machining system with standby components. Yuan and Meng (2011) did reliability analysis of a warm standby repairable system with priority in use. Jain (2013) suggested numerical approach based on Runge–Kutta method to compute the transient performance indices of machining systems with mixed standbys by incorporating the features of service interruption and priority.
For the modeling of queueing problems, many researchers developed more realistic queueing models by incorporating the concept of server vacation. However, a limited number of papers have appeared on machine repair models with spares provisioning by including the feature of vacation of the repairmen. Gupta (1997) considered machine interference problem with warm spare, server vacations and exhaustive service. Ke and Wang (2007) suggested the vacation policies for machine repair problem with two types of spares. Wang et al. (2006, 2009) suggested optimal management of the machine repair problem with working vacation and used Newton’s method for the solution purpose. Ke et al. (2011) made an algorithmic analysis of unreliable server machine repair system with spares by developing multi-server synchronous vacation model with service interruptions due to server failure. Ke and Wu (2012) and Ke et al. (2013) developed a multi-server machine repair model with standbys and synchronous multiple vacations.
To deal with more realistic scenarios of machining system, the behavior of the customers and care taker should be taken into account for the performance analysis of such systems. Machine repair problems by incorporating the concepts of balking and reneging have been investigated by many researchers (Shawky 1997, 2000; Wang and Ke 2003; Jain et al. 2003; Sharma et al. 2004). Wang et al. (2011) performed cost benefit analysis of a machining system with warm standby components and variable server by incorporating the concept of balking. In recent past, the concept of common-cause failure which can be realized in many real time machining systems has been studied extensively (Platz 1984; Mosleh 1991; Pan and Nonaka 1995). The redundancy provision in K-r-out of-N: G configuration machining system under the assumption of common-cause failure has been investigated by Reddy (1993), Jain and Ghimire (1997), Jain et al. (2002), and many more. Jain and Mishra (2006) analyzed system characteristics of multistage degraded machining system with common-cause shock failure and state dependent rates. El-Damcese (2009) investigated the performance indices of warm standby systems subject to common-cause failures with time varying failure and repair rates. The effect of common-cause failures as major issue in safety of machining systems was examined by Ilavsky et al. (2013). Mishra and Jain (2013) studied the effect of common-cause failure on the maintainability of a deteriorating system having the inspection provision.
Due to wear and tear or any other technical fault, the servers may be prone to partial or complete failures. In case of partial failure, the servers continue to operate but their failure rate increases which further leads to fully failure state of the system. The overload due to functioning of less number of components which are required for normal operation also causes the adverse effect on the performance of the machining system. In machine repair system, the server providing repair to the failed machines may breakdown due to over load or long run operation. Ke and Lin (2008) discussed sensitivity analysis of machine repair problems in manufacturing systems with service interruptions due to server failure. Yue et al. (2009) studied a heterogeneous two-server queueing system with balking and server breakdowns. In many multi-component machining systems, the target of high availability using redundancy is rather difficult and some time impossible. The high availability and efficiency of a machining system can also be enhanced by improving its maintainability. To reduce the workload of failed units in such systems and to achieve pre-specified availability, the better maintenance facility can be provided with the provision of additional repairman. In real time systems, the working capacity of two repairmen may not be same. Jain et al. (2004) have studied a (N, L) switch-over policy for two heterogeneous repairmen machine repair model with warm standbys and vacation. The provision of two heterogeneous servers and vacation was considered by Kumar and Jain (2013) for the (m, M) machine repair problem with spares and switching failure.
In this investigation, we develop (m, M) Markov model for multi-component machining system with mixed warm standby provisioning and under the care of two heterogeneous servers. To make model more realistic, we consider that the primary server as well as secondary server are unreliable and subject to breakdown individually or simultaneously due to common cause. The primary server can also work with slower rate in case of partial failed condition. To illustrate the practical applicability of our model, we give the example of a power plant having M operating nuclear turbine generators (i.e. base units) and S 1 and S 2 standby units of gas turbine generators of type 1 and 2 (having different failure characteristics). The type 1 standby unit i.e. gas turbine generator is used first in case of failure of any operating generator. When all S 1 of type 1 gas turbine generators are used and further any other operating generator fails, we replace it by type 2 gas turbine generators if available. In case when all the standby generators of both types are used to replace the failed generator and there are less than M but at least m generators are functioning in the power plant, the failure rate of operating generator increases due to overload. There is a provision of two dissimilar servers who provide repair of the failed generators with different rates. The life times of nuclear/gas turbine generators and servers are exponentially distributed. The server who is responsible for maintenance of system may also be unavailable due to illness or pre-commitment to some other job. Both primary as well as secondary servers may become unavailable individually or simultaneously due to some common cause. The primary service engineer may be available partially and will provide repair of the failed generator with reduced rate. From partial available state either he becomes fully available after some treatment or goes for complete rest and becomes completely unavailable to provide repair of the failed generator. The secondary service engineer either provides repairs if available or may become completely inoperative due to breakdown. The unavailable service engineers can restore its repair capability after some random interval of times which are exponentially distributed.
For the performance modeling and queueing analysis of the concerned machine repair problem, the investigation done is organized as follows. The model description by stating the requisite assumptions and notations is presented in “Model description and assumptions”. In “Steady state equations”, the governing equations are constructed with the help of state dependent failure and repair rates. Various performance measures in terms of steady-state probabilities are obtained in “Some performance indices”. The sensitivity analysis has been performed to examine the effect of various parameters on the system performance in “Numerical results”. Finally, conclusions are drawn in “Conclusion”.
Model description and assumptions
In this section, we give the machining system description by clearly throwing light on the various components of the system. For the mathematical modeling, the basic factors associated with the machine repair problem under consideration have been stated in terms of requisite assumptions and notations.
Consider an (m, M) machining system having two servers. The primary server can fail partially as well as fully whereas secondary server can fail completely. To support the system, the provision of two type warm standbys is made. The life times of operating units, standby units and servers are exponentially distributed. The failure rate of type 1(2) standby units is \( \alpha_{1} (\alpha_{2} ) \) which is less than that of failure rate λ of operating units. The system can also fail due to common cause. We use the following other assumptions to formulate the model mathematically:
-
When primary server is functioning, the secondary server works as standby.
-
Both servers provide repair according to exponential distribution.
-
The primary server can work in normal and degraded mode (i.e. partially failed state) both whereas secondary server can work only in normal mode.
-
If primary server fails partially, the secondary server turns on if there is any machine to be repaired and turns off when the queue becomes empty.
-
There is a need of repair to the broken-down server to restore its operating state.
-
The repair time of broken-down server is assumed to be exponentially distributed.
-
The type 1 standby (warm) units if available replace the failed operating units, and then its characteristic is the same as that of operating units. In case when type 1 standby units are exhausted, type 2 standby unit is used to replace the failed operating units.
-
When both types of standby units have been used, and operating units fail, the system will be in functioning state in degraded mode till there are at least m(<M) operating units present in the system. As soon as there are less than m operating units in the machining system, it fails.
-
The switch-over time from standby to operating state of the units is assumed to be negligible.
To describe the model, the following notations are used:
- M :
-
Number of operating units in the system
- S 1 (S 2 ):
-
Number of warm spare units of type 1(2) in the system
- M :
-
Minimum number of operating units required for the system to function
- λ(λ c):
-
Failure rate (common-cause failure rate) of operating units in the system
- λ d :
-
Degraded failure rate of operating units when all warm spares are utilized
- b c(b cp):
-
Common-cause failure rate of first server is in working (partially failed) state
- b 1(b 2):
-
Failure rate of first (second) server
- \( b^{\prime}_{{ 1 p}} \) :
-
Partially failure rate of first server when second server is in breakdown state
- b 1p :
-
Partially failure rate of first server when second server is in working state
- r c :
-
Common cause repair rate of both servers
- r cp :
-
Common cause repair rate of partially failed first server when second server is in working state
- r 1(r 2):
-
Repair rate of first (second) server
- μ 1 :
-
Repair rate of the servers when first server is in working state and second server is in breakdown state
- μ 2 :
-
Repair rate of servers when first server is in breakdown state and second server is in working state
- P i,0,1 :
-
Probability that there are i failed units in the system and first server is in breakdown state while the second one is in working state
- P i,0,0 :
-
Probability that both servers are in breakdown state and there are i failed units in the system
- P i,1,1 :
-
Probability that both servers are in working state and there are i failed units in the system
- P i,1,0 :
-
Probability that there are i failed units in the system and first server is in working state while the second one is in breakdown state
- P i,p,1 :
-
Probability that there are i failed units in the system and first server is in partially failed state while the second one is in working state
- P i,p,0 :
-
Probability that there are i failed units in the system and first server is in partially failed state while the second one is in breakdown state
Steady state equations
In this section, the mathematical formulation of the machine repair problem under consideration is done by constructing Chapman Kolmogov equations for the system state probabilities. To construct the difference equations governing the model, we define the failure rates as follows:
Chapman Kolmogorov equations governing the model (see Fig. 1) are given by
The steady-state difference equations constructed in previous section can be put in the form AX = B i.e. the matrix form of the system of linear equations. This system of linear equations has been solved using the numerical technique successive over relaxation (SOR) method. This technique is an extrapolation to Gauss–Seidal method, which accelerates the convergence rate by taking the relaxation parameter \( w > 1 \) (more specifically \( w = 1.25 \)) which is unity in case of Gauss–Seidal method.
Some performance indices
For the efficient machining system, the designers/developers chalk out the plan of maintainability and redundancy based on the performance analysis. For the performance prediction of machining system, it is important to provide the expressions for key indices including the queue length. The queue length in the machine repair problem refers the total number of failed machines waiting for repair in the queue including those which are in the process of repair with the server. Now we provide the explicit results in terms of probabilities for some performance measures as follows:
-
The expected number of failed machines in the queue is
$$ E(n) = \sum\limits_{i = 0}^{M + S - m + 1} {i\left( {P_{i,1,1} + P_{i,1,0} + P_{i,p,1} + P_{i,p,0} + P_{i,0,1} } \right)} $$(19)
-
The probability that both servers are in working state is given by
$$ P(w) = \sum\limits_{i = 0}^{M + S - m + 1} {P_{i,1,1} } $$(20)
-
The probability that both servers are in breakdown state is given by
$$ P(b) = \sum\limits_{i = 0}^{M + S - m + 1} {P_{i,0,0} } $$(21)
-
The probability that the first server is in working state but secondary server is in breakdown state, is
$$ P(s_{1} ) = \sum\limits_{i = 0}^{M + S - m + 1} {P_{i,1,0} } $$(22)
-
The probability that the secondary server is in working state but primary server is in breakdown state is
$$ P(s_{2} ) = \sum\limits_{i = 0}^{M + S - m + 1} {P_{i,0,1} } $$(23)
-
Throughput is obtained as
$$ T\left( p \right) = \mu_{2} \sum\limits_{i = 1}^{M + S - m + 1} {\left[ {P_{i,0,1} + P_{i,p,1} } \right] + \mu_{1} \sum\limits_{i = 1}^{M + S - m + 1} {\left[ {P_{i,1,0} + P_{i,1,1} + P_{1,p,0} } \right]} } $$(24)
-
Expected waiting time of failed units in the system is determined using Little formula given by
$$ E\left( W \right) \, = \, E\left( n \right)/\lambda_{\text{eff}} , $$(25)$$ {\text{where}}\;\lambda_{\text{eff}} = \sum\limits_{i = 0}^{M + S - m} {\lambda_{i} \left( {P_{i,1,1} + P_{i,1,0} + P_{{i,p,1}} + P_{{i,p,0}} + P_{i,0,1} } \right)}. $$(26)
Numerical results
Numerical results based on numerical simulation can provide quantitative assessment of understanding of the performance indices. The effect of different parameters on the performance indices can also be explored by numerical simulation. In this section, the sensitivity analysis has been carried out to analyze the trend of the system descriptors as detailed below.
To compute the numerical results, we consider the illustration of power plant as described in the introduction. The power plant consists of M = 6 operating nuclear turbine generators and S 1 = 2 and S 2 = 3 standby gas turbine generators of type 1 and 2 having same failure characteristics. The failure rate of operating nuclear turbine generators is \( \lambda = 0.3 \), and failure rates of standby gas turbine generators of type 1 and 2 are \( \alpha_{1} = 0.9 \) and \( \alpha_{2} = 0.9 \), respectively. For computational purpose, the program has been coded in MATLAB software for other default parameters chosen as b 1 = b 2 = 0.5, \( \alpha_{c} = 0.0 \) and r 1 = r 2 = r = 1.3. The expected queue length E(n) against failure rate (λ) of operating units by varying different parameters such as number of operating units (M), minimum number of operating units (m), number of warm standbys (S), repair rate (r), failure rate of standbys (α) and server’s breakdown rate (b) has been displayed in Fig. 2a–f, respectively.
-
1.
Effect of failure rate (λ, α) and repair rate (r)
Figure 2a–f reveal the effect of λ on the queue length for the variation of different parameters. It is noticed that on increasing λ, the queue length of failed units in the system increases. Figure 2e demonstrates the effect of failure rate (α) of the standby units on the queue length, an increasing pattern of the queue length with respect α matches with our expectation.
-
2.
Effect of repair rate (r)
In Fig. 2d, the expected number of failed units in the system seems to decrease as we increase the repair rate (r). By improving the repair facility in terms of faster repair, one may improve the system availability as there will be reduction in the number of failed units in the system.
-
3.
Effect of number of operating units (M) and minimum required operating units (m)
In Fig. 2a and b, the effect of the number of operating unit (M) and minimum required operating units (m) on the queue length are shown. As we expect, in both figures, the queue length increases with the increase in M and m. This is due to the fact that as the number of operating units in the system is large, the system has more units as such the number of failed units will increase. It is seen in Fig. 2a that the increment in the queue length with respect to M is more remarkable for higher values of λ due to increase in traffic load.
-
4.
Effect of number of warm standbys (S)
Figure 2c displays the effect of increment in the number of spares (S) on the queue length. It is found that the queue length increases slowly by the increment in S for starting values of λ, however, for the higher values of λ, a more significant increment in the queue length is found. The reason behind the adverse effect of S on the queue length is attributed to the increase in population size of the total number of units in the system.
-
5.
Effect of server’s breakdown rate (b)
The adverse effect of server breakdown rate (b) on the queue length is clear from Fig. 2f where we notice the increasing trend of queue length with the increase in b. This shows that due to the server breakdown, the repair of failed units is adversely affected.
Overall, we conclude that
-
By increasing the number of units required for normal operation or least number of units required for operation, we see the increment in the queue length.
-
Frequent breakdown of the server also results in higher queue length; however, the queue size comes down by increasing the repair rate. These patterns tally with the realistic situations.
Conclusion
In many industries the operation can be interrupted because of the occurrence of failure of machines and breakdown in the repair facility. In this investigation, we have developed a finite population queueing model of multi-component machine repair system wherein individual component failure and common-cause failure may occur. It is essential for the smooth running of any machining system to control the system failure by employing the suitable repair and spare part support strategy. To cope up with the failure and to achieve the goal of high performance of machining system, the provision of a repair crew having two dissimilar unreliable servers and spare part support are taken into consideration. The provision of single type spare units is common to ensure smooth running of the system, but we have considered the mixed warm standbys; the reason behind this feature is some physical constraints such as volume, weight, cost, wait-space, etc. which limits in providing single type of spare units. Various performance characteristics established for the concerned system with two types of spares give insights for more versatile situations of real time systems operating in multi-component environment and subject to component failures, common-cause failures and server failures. The numerical simulation and sensitivity analysis performed may be helpful to visualize the effect of different parameters on the performance measures. The model can be further extended by including the concepts of group failure and switching failure.
References
El-Damcese MA (2009) Analysis of warm standby systems subject to common-cause failures with time varying failure and repair rates. Appl Math Sci 3(18):853–860
Gupta SM (1997) Machine interference problem with warm spares, server vacations and exhaustive service. Perf Eval 23(3):195–211
Haque L, Armstrong MJ (2007) A survey of the machine interference problem. Euro J Oper Res 179(2):469–482
Ilavsky J, Rastocny K, Zdansky J (2013) Common-cause failures as major issue in safety of control systems. Int Safety Reliab Syst 11(2):86–93
Jain M (1997) An (m, M) machine repair problem with spares and state dependent rates: a diffusion process approach. Microelectron Reliab 37(6):929–933
Jain M (2013) Transient analysis of machining systems with service interruption, mixed standbys and priority. Int J Math Oper Res 5(5):604–625
Jain M, Ghimire RP (1997) Reliability of Kr-out-of N: G system subject to random, common cause failure. Perf Eval 29:213–218
Jain M, Mishra A (2006) Multistage degraded machining system with common cause shocks failure, state dependent rates. J Raj Acad Phys Sci 5(3):251–262
Jain M, Maheshwari S, Rakhee (2002) Study of loading policies for K-r-out of N:G system subject to common cause failure. R and D Quality Quest 4(2):15–23
Jain M, Sharma GC, Singh M (2003) M/M/R machine interference model with balking, reneging, spares, two modes of failure. OPSEARCH 40(1):24–41
Jain M, Singh M, Rakhee (2004) Bilevel control of degraded machining system with warm standbys, setup and vacation. Appl Math Model 28(3):1015–1026
Jain M, Sharma GC, Sharma R (2008) Performance modeling of state dependent system with mixed standbys, two modes of failure. Appl Math Model 32:712–724
Jain M, Sharma GC, Pundhir RS (2010) Some perspectives of machine repair problems. Int J Eng Trans B Appli 23(3 and 4):253–268
Ke JC, Lin CH (2008) Sensitivity analysis of machine repair problem in manufacturing systems with service interruption. Appl Math Model 32(10):2087–2105
Ke JC, Wang KH (2007) Vacation policies for machine repair problem with two type spares. Appl Math Model 31(5):880–894
Ke JC, Wu CH (2012) Multi-server machine repair model with standbys, synchronous multiple vacation. Comp Indust Eng 62(1):296–305
Ke JC, Lin CH, Zhang ZG (2011) An algorithmic analysis of multi-server vacation model with service interruptions. Comp Indust Eng 61(4):1302–1308
Ke JC, Hsu YL, Liu TH, Zhang ZG (2013) Computational analysis of machine repair problem with unreliable multi-repairmen. J Comp Oper Res 40(3):848–855
Kumar K, Jain M (2013) Threshold N-policy for (M, m) degraded machining system with heterogeneous servers, standby switching failure and multiple vacation. Int J Math Oper Res 5(4):423–445
Mishra A, Jain M (2013) Maintainability policy for deteriorating system with inspection, common cause failure. Int J Eng Trans C Basics 26(6):371–380
Mosleh A (1991) Common cause failure: an analysis methodology, examples. Reliab Eng 34(3):249–292
Pan Z, Nonaka Y (1995) Importance analysis for the systems with common cause failures. Reliab Eng Syst Safet 50(3):297–300
Platz O (1984) A Markov model for common cause failure. Reliab Eng 9:25–31
Reddy CR (1993) Optimization of K-out of—n systems subject to common cause failure with repair provision. Microelectron Reliab 33(2):175–183
Sharma GC, Jain M, Baghel KPS (2004) Performance modeling of machining system with mixed standby component balking reneging. Int J Eng 17(2):169–180
Shawky AI (1997) The single server machine interference model with balking reneging an additional server for longer queues. Microelectron Reliab 37(2):355–357
Shawky AI (2000) The machine interference model M/M/C/K/N with balking, reneging, spares. OPSEARCH 37(1):25–35
Sivazlian BD, Wang KH (1989) Economic analysis of the M/M/R machine repair problem with warm standby. Microelectron Reliab 29(5):9829–9840
Wang KH (1995) An approach to cost analysis of the machine repair problem with two types of spares, service rates. Microelectron Reliab 35(11):1433–1436
Wang KH, Chang YC (2002) Cost analysis of finite M/M/R queueing system with balking, reneging, server breakdowns. Math Meth Oper Res 56:169–180
Wang KH, Ke JC (2003) Probability analysis of a repairable system with warm standbys plus balking, reneging. Appl Math Model 27(4):327–336
Wang KH, Kuo CC (2000) Cost, probabilistic analysis of series system with mixed standby components. Appl Math Model 24:957–967
Wang KH, Hsieh CH, Liou CH (2006) Cost benefit analysis of series systems with cold standby components, a repairable service station. Qual Technol Quant Manage 3(1):77–92
Wang KH, Chen WL, Yang DY (2009) Optimal management of the machine repair problem with working vacation: Newton’s method. J Comp Appl Math 233(2):449–458
Wang KH, Liou YC, Yang DY (2011) Cost optimization, sensitivity analysis of the machine repair problem with variable servers, balking. Proc Soc Behav Sci 25(1):178–188
Wang KH, Liou CD, Lin YH (2013) Comparative analysis of the machine repair problem with imperfect coverage, service pressure condition. Appl Math Model 410(1):1–37
Yang T, Lee RS, Chen MC, Chen P (2005) Queueing network model for a single-operator machine interference problem with external operations. Euro J Oper Res 67(1):163–178
Yuan L, Meng XY (2011) Reliability analysis of a warm standby repairable system with priority in use. Appl Math Model 35(9):4295–4303
Yue D, Yue W, Yu J, Tian R (2009) A heterogeneous two-server queuing system with balking, server breakdowns. Eight Int Sympo Oper Res Appl 3:230–244
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Jain, M., Mittal, R. & Kumari, R. (m, M) Machining system with two unreliable servers, mixed spares and common-cause failure. J Ind Eng Int 11, 171–178 (2015). https://doi.org/10.1007/s40092-014-0053-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40092-014-0053-y