Optimal Preventive Maintenance Policy for Non-Identical Components: Traditional Renewal Theory vs Modern Reinforcement Learning

Document Type : Original Research Article


School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran


This paper compares the traditional approach against reinforcement learning algorithms to find the optimal preventive maintenance policy for equipment composed of multi-non-identical components with different time-to-failure distributions. As an application, we used the data from military trucks, which consisted of multiple components with very different failure behavior, such as tires, transmissions, wheel rims, couplings, motors, brakes, steering wheels, and shifting gears. The literature proposes Four different strategies for preventive maintenance of these components. To find the optimal preventive manganocene policy, we used the traditional approach (renewal theory-based) and the conventional reinforcement learning algorithms and compared their performance. The main advantages of the latter approach are that, unlike the traditional approach, they are not required to estimate the model parameters (e.g., transition probabilities). Without any explicit mathematical formula, they converge to the optimal solution. Our results showed that the traditional approach works best when the component time-to-failure distributions are available. However, the reinforcement learning approach outperforms where no such information is available or the distributions are misspecified.


Main Subjects

  1. C. Márquez and J. N. D. Gupta, “Contemporary maintenance management: process, framework and supporting pillars,” Omega, vol. 34, no. 3, pp. 313–326, Jun. 2006, doi: https://doi.org/10.1016/j.omega.2004.11.003
  2. Ravichandiran, “Hands-on reinforcement learning with python: Master reinforcement and deep reinforcement learning using OpenAI Gym and TensorFlow”. Birmingham, England: Packt Publishing, 2023.
  3. Wang, H. Wang, and Q. Chen, “Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system,” Journal of Intelligent Manufacturing, vol. 27, no. 2, pp. 325–333, Jan. 2014, doi: https://doi.org/10.1007/s10845-013-0864-5
  4. Liang, T. Deng, and Z.-J. M. Shen, “Demand-side energy management under time-varying prices,” IISE Transactions, vol. 51, no. 4, pp. 422–436, Feb. 2019, doi: https://doi.org/10.1080/24725854.2018.1504357
  5. Yousefi, S. Tsianikas, and D. W. Coit, “Reinforcement learning for dynamic condition-based maintenance of a system with individually repairable components,” Quality Engineering, vol. 32, no. 3, pp. 388–408, Jun. 2020, doi: 10.1080/08982112.2020.1766692. https://doi.org/10.1080/08982112.2020.1766692
  6. Adsule, M. S. Kulkarni, and A. Tewari, “Reinforcement learning for optimal policy learning in conditionā€based maintenance,” IET Collaborative Intelligent Manufacturing, vol. 2, no. 4, pp. 182–188, Oct. 2020, doi: https://doi.org/10.1049/iet-cim.2020.0022
  7. A. Haleem and S. Yacout, “Simulation of Components Replacement Policies for a Fleet of Military Trucks,” Quality Engineering, vol. 11, no. 2, pp. 303–308, Dec. 1998, doi: https://doi.org/10.1080/08982119808919242
  8. Barde, S. Yacout, and H. Shin, “Optimal preventive maintenance policy based on reinforcement learning of a fleet of military trucks,” Journal of Intelligent Manufacturing, vol. 30, no. 1, pp. 147–161, Jun. 2016, doi: https://doi.org/10.1007/s10845-016-1237-7
  9. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality (Wiley Series in Probability and Statistics). 2007. [Online]. Available: https://dl.acm.org/citation.cfm?id=1324761
  10. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction”, MIT press, 2018.
  11. J. N. Tsitsiklis, "On the convergence of optimistic policy iteration," Journal of Machine Learning Research, vol. 3, pp. 59-72, 2002, doi: https://doi.org/10.1162/153244303768966102