DOI: https://doi.org/10.32515/2664-262X.2025.11(42).195-203
Modeling the Stochastic State Matrix of a Production Line for Optimize its Operational Reliability Using Reinforcement Learning
About the Authors
Serhii Kovalov, Candidate of Pedagogical Sciences, Lecturer, Department of Higher Mathematics and Physics, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: 0009-0002-3922-8697, e-mail: kovalyovserggr@ukr.net
Viktor Aulin, Professor, Doctor of Technical Sciences, Professor of the Department of Operation and Repair of Machines, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0003-2737-120X, e-mail: aulinvv@gmail.com
Andriy Grynkiv, Senior Researcher, PhD (Candidate of Technical Sciences), Senior Lecturer of the Department of Machinery Operation and Repair, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0002-4478-1940, е-mail: AVGrinkiv@gmail.com
Yuriy Kovalov, Candidate of Technical Sciences, Associate Professor, Associate Professor of the Department of Unmanned Technologies and Artificial Intelligence, Ukrainian State Flight Academy, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0002-1729-2033, e-mail: kovalyovserggr@ukr.net
Abstract
The development of a production line state determination model aims to create a universal tool for evaluating and optimizing industrial systems. The proposed approach enables real-time analysis of equipment states, prediction of potential failures, and enhancement of overall operational efficiency.
The use of Markov chains allows for precise modeling of the sequence of production line states and the probabilities of transitions between them. This stochastic approach improves adaptability to real-world manufacturing conditions, surpassing the capabilities of traditional deterministic methods.
The formation of a stochastic state matrix optimizes production processes through advanced data analytics and AI integration. This enables manufacturers to minimize downtime, enhance resource allocation, and improve overall productivity while maintaining operational stability.
Transition probability estimation is based on both historical databases and real-time sensor measurements, allowing the model to adapt to various equipment types and operating conditions. AI-driven optimization enhances failure prediction accuracy, ensuring the production line remains efficient under diverse scenarios. By integrating Markov chains with data-driven insights, the approach supports proactive failure prevention and strategic resource management, ultimately improving the reliability and performance of industrial systems.
Keywords
production line, Artificial Intelligence, production automation, Markov chain theory, stochastic matrix
Full Text:
PDF
References
1. Aulin V.V. (2020). Methodological foundations of design and operation of intelligent transport systems and production systems. Kropyvnytskyi,. [in Ukrainian].
2. Neves M., Vieira V., Neto P. (2021) A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems.. No. 59. https://doi.org/10.1016/j.jmsy.2021.02.014.
3. Zhao, M., Lu, H., Yang, S., Guo, F. (2020) The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access No. 8. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9022975
4. Palacio J.C., Jiménez Y.M., Schietgat, L., Van Doninck B., Nowé, A. A. (2022) Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP.No. 106, P 227–232. URL: https://doi.org/10.1016/j.procir.2022.02.183.
5. Ha D. Reinforcement learning for improving agent design. (2019). Artificial life. Vol. 25, No. 4. P. 352–365. URL: https://doi.org/10.1162/artl_a_00301.
6. Han R., Chen K., Tan C. (2020). Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning. British journal of mathematical and statistical psychology. Vol. 73, No. 3. P. 522–540. URL: https://doi.org/10.1111/bmsp.12199.
7. Sun S. et al. (2020). A Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local. Advanced robotics. Vol. 34, No. 13. P. 888–901. URL: https://doi.org/10.1080/01691864.2020.1753569.
8. L.A.P., Fu M.C. (2022) Risk-Sensitive reinforcement learning via policy gradient search. Foundations and trends® in machine learning. Vol. 15, No. 5. P. 537–693. URL: https://doi.org/10.1561/2200000091.
9. He S. (2019) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural computing and applications.. Vol. 32, No. 18. P. 14311–14320. URL: https://doi.org/10.1007/s00521-019-04180-2.
10. Moore B.L. and others. (2011) Reinforcement learning. Anesthesia & analgesia. Vol. 112, No. 2. P. 360–367. URL: https://doi.org/10.1213/ane.0b013e31820334a7.
11. Yan Y. et al. (2022) Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities. Transportation research part E: logistics and transportation review. Vol. 162. P. 102712. URL: https://doi.org/10.1016/j.tre.(2022).
12. Wu Y. et al. (2021) Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning. EURASIP journal on wireless communications and networking. Vol. 2021, No. 1. URL: https://doi.org/10.1186/s13638-021-01939-x.
13. Jesus J.C. et al. (2019) Deep deterministic policy gradient for navigation of mobile robots in simulated environments. 2019 19th international conference on advanced robotics (ICAR), P. 349 – 361. URL: https://doi.org/10.1109/icar46387.2019.8981638.
14. Hu H, Yang M, Yuan Q, You M,, Shi X, Sun Y. (2024). Sensors Direct Position Determination of Non-Gaussian Sources for Multiple Nested Arrays. Discrete Fourier Transform and Taylor Compensation Algorithm. Vol. 24. #12. 3801. https://doi.org/10.3390/s24123801.
15. Aulin V.V., Kovalov S.G., Hrynkiv A.V., Varvarov V.V. (2024). Increasing the reliability and efficiency of production lines using artificial intelligence methods with monitoring of acoustic signals. Central Ukrainian Scientific Bulletin. Technical Sciences: Collection of Scientific Works. Kropyvnytskyi: TsUNTU 10(41). https://doi.org/10.32515/2664-262X.2024.10(41).2.142-151 [in Ukrainian].
16. Aulin V.V., Kovalov S.G., Hrynkiv A.V., Varvarov V.V. (2024) Algorithm for optimizing the reliability and efficiency of production equipment using artificial intelligence methods. Central Ukrainian Scientific Bulletin. Technical Sciences. Kropyvnytskyi: TsUNTU. 10(41). https://doi.org/10.32515/2664-262X.2024.10(41).1.60-67 [in Ukrainian].
17. Kovalov, S.G. (2025.) Optimization of production time using reinforcement learning as a special case of increasing the efficiency of automated production lines. Central Ukrainian Scientific Bulletin. Technical Sciences: Collection of Scientific Works. Kropyvnytskyi: TsNTU 11(42). https://doi.org/10.32515/2664-262X.2025.11(42).1.198-205.[in Ukrainian].
18. Kovalov S.G., Kovalov Y.G. (2024). Features of the implementation of artificial neural network models using hardware solutions. "Science and Technology Today". No. 6(34). https://doi.org/10.52058/2786-6025-2024-6(34) [in Ukrainian].
Citations
1. Методологічні основи проектування та функціонування інтелектуальних транспортних систем та виробничих систем / В.В. Аулін та ін,; за рад. В.В. Аулін. Кропивницький, 2020. 451с.
2. Neves M., Vieira V., Neto P. A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems. 2021. № 59, P. 426–440. URL: https://doi.org/10.1016/j.jmsy.2021.02.014.
3. Zhao, M., Lu, H., Yang, S., Guo, F. The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access 2020, №8, P.47824–47844. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9022975
4. Palacio J.C., Jiménez Y.M., Schietgat, L., Van Doninck B. , Nowé, A. A. Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP. 2022, №106, P 227–232. URL: https://doi.org/10.1016/j.procir.2022.02.183
5. Ha D. Reinforcement learning for improving agent design. Artificial life. 2019. Vol. 25, No. 4. P. 352–365. URL: https://doi.org/10.1162/artl_a_00301.
6. Han R., Chen K., Tan C. Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning. British journal of mathematical and statistical psychology. 2020. Vol. 73, No. 3. P. 522–540. URL: https://doi.org/10.1111/bmsp.12199.
7. Sun S. et al. A Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local. Advanced robotics. 2020. Vol. 34, No. 13. P. 888–901. URL: https://doi.org/10.1080/01691864.2020.1753569.
8. L.A.P., Fu M.C. Risk-Sensitive reinforcement learning via policy gradient search. Foundations and trends® in machine learning. 2022. Vol. 15, No. 5. P. 537–693. URL: https://doi.org/10.1561/2200000091.
9. He S. Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural computing and applications. 2019. Vol. 32, No. 18. P. 14311–14320. URL: https://doi.org/10.1007/s00521-019-04180-2.
10. Moore B. L. and others. Reinforcement learning. Anesthesia & analgesia. 2011. Vol. 112, No. 2. P. 360–367. URL: https://doi.org/10.1213/ane.0b013e31820334a7.
11. Yan Y. et al Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities . Transportation research part E: logistics and transportation review. 2022. Vol. 162. P. 102712. URL: https://doi.org/10.1016/j.tre.(2022).
12. Wu Y. et al. Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning . EURASIP journal on wireless communications and networking. 2021. Vol. 2021, No. 1. URL: https://doi.org/10.1186/s13638-021-01939-x.
13. Jesus J. C. et al. Deep deterministic policy gradient for navigation of mobile robots in simulated environments. 2019 19th international conference on advanced robotics (ICAR), December 2–6. 2019, Belo Horizonte, Brazil. P. 349 – 361. URL: https://doi.org/10.1109/icar46387.2019.8981638
14. Hu H, Yang M, Yuan Q, You M,, Shi X, Sun Y Sensors Direct Position Determination of Non-Gaussian Sources for Multiple Nested Arrays. Discrete Fourier Transform and Taylor Compensation Algorithm. 2024, Vol. 24. №12. 3801. https://doi.org/10.3390/s24123801.
15. Аулін В.В., Ковальов С.Г., Гриньків А.В., Варваров В.В. Підвищення надійності та ефективності виробничих ліній з використанням методів штучного інтелекту з моніторингом акустичних сигналів. Центральноукраїнський науковий вісник. Технічні науки: Збірник наукових праць - Кропивницький: ЦУНТУ. 2024. Вип.10(41). Ч. 2. - Стор. 142-151. URL: https://doi.org/10.32515/2664-262X.2024.10(41).2.142-151
16. Аулін В.В., Ковальов С.Г., Гриньків А.В., Варваров В.В. Алгоритм для оптимізації надійності та ефективності виробничого обладнання з використанням методів штучного інтелекту. Центральноукраїнський науковий вісник. Технічні науки. Кропивницький: ЦУНТУ, 2024. Вип. 10(41). Ч. 1. С. 60-67. URL: https://doi.org/10.32515/2664-262X.2024.10(41).1.60-67
17. Ковальов, С.Г. Оптимізація виробничого часу з використанням навчання з підкріпленням як окремий випадок підвищення ефективності автоматизованих виробничих ліній. Центральноукраїнський науковий вісник. Технічні науки: Збірник наукових праць – Кропивницький: ЦУНТУ, 2025. Вип. 11(42). Ч. 1. С. 198-205. URL: https://doi.org/10.32515/2664-262X.2025.11(42).1.198-205
18. Ковальов С.Г., Ковальов Ю.Г. Особливості реалізації моделей штучних нейронних мереж з використанням апаратних рішень. "Наука і технології сьогодні". 2024. №6(34), С. 1131. URL: DOI: https://doi.org/10.52058/2786-6025-2024-6(34)
Copyright (c) 2025 Serhii Kovalov, Viktor Aulin, Andriy Grynkiv, Yuriy Kovalov
Modeling the Stochastic State Matrix of a Production Line for Optimize its Operational Reliability Using Reinforcement Learning
About the Authors
Serhii Kovalov, Candidate of Pedagogical Sciences, Lecturer, Department of Higher Mathematics and Physics, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: 0009-0002-3922-8697, e-mail: kovalyovserggr@ukr.net
Viktor Aulin, Professor, Doctor of Technical Sciences, Professor of the Department of Operation and Repair of Machines, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0003-2737-120X, e-mail: aulinvv@gmail.com
Andriy Grynkiv, Senior Researcher, PhD (Candidate of Technical Sciences), Senior Lecturer of the Department of Machinery Operation and Repair, Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0002-4478-1940, е-mail: AVGrinkiv@gmail.com
Yuriy Kovalov, Candidate of Technical Sciences, Associate Professor, Associate Professor of the Department of Unmanned Technologies and Artificial Intelligence, Ukrainian State Flight Academy, Kropyvnytskyi, Ukraine, ORCID: https://orcid.org/0000-0002-1729-2033, e-mail: kovalyovserggr@ukr.net
Abstract
Keywords
Full Text:
PDFReferences
1. Aulin V.V. (2020). Methodological foundations of design and operation of intelligent transport systems and production systems. Kropyvnytskyi,. [in Ukrainian].
2. Neves M., Vieira V., Neto P. (2021) A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems.. No. 59. https://doi.org/10.1016/j.jmsy.2021.02.014.
3. Zhao, M., Lu, H., Yang, S., Guo, F. (2020) The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access No. 8. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9022975
4. Palacio J.C., Jiménez Y.M., Schietgat, L., Van Doninck B., Nowé, A. A. (2022) Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP.No. 106, P 227–232. URL: https://doi.org/10.1016/j.procir.2022.02.183.
5. Ha D. Reinforcement learning for improving agent design. (2019). Artificial life. Vol. 25, No. 4. P. 352–365. URL: https://doi.org/10.1162/artl_a_00301.
6. Han R., Chen K., Tan C. (2020). Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning. British journal of mathematical and statistical psychology. Vol. 73, No. 3. P. 522–540. URL: https://doi.org/10.1111/bmsp.12199.
7. Sun S. et al. (2020). A Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local. Advanced robotics. Vol. 34, No. 13. P. 888–901. URL: https://doi.org/10.1080/01691864.2020.1753569.
8. L.A.P., Fu M.C. (2022) Risk-Sensitive reinforcement learning via policy gradient search. Foundations and trends® in machine learning. Vol. 15, No. 5. P. 537–693. URL: https://doi.org/10.1561/2200000091.
9. He S. (2019) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural computing and applications.. Vol. 32, No. 18. P. 14311–14320. URL: https://doi.org/10.1007/s00521-019-04180-2.
10. Moore B.L. and others. (2011) Reinforcement learning. Anesthesia & analgesia. Vol. 112, No. 2. P. 360–367. URL: https://doi.org/10.1213/ane.0b013e31820334a7.
11. Yan Y. et al. (2022) Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities. Transportation research part E: logistics and transportation review. Vol. 162. P. 102712. URL: https://doi.org/10.1016/j.tre.(2022).
12. Wu Y. et al. (2021) Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning. EURASIP journal on wireless communications and networking. Vol. 2021, No. 1. URL: https://doi.org/10.1186/s13638-021-01939-x.
13. Jesus J.C. et al. (2019) Deep deterministic policy gradient for navigation of mobile robots in simulated environments. 2019 19th international conference on advanced robotics (ICAR), P. 349 – 361. URL: https://doi.org/10.1109/icar46387.2019.8981638.
14. Hu H, Yang M, Yuan Q, You M,, Shi X, Sun Y. (2024). Sensors Direct Position Determination of Non-Gaussian Sources for Multiple Nested Arrays. Discrete Fourier Transform and Taylor Compensation Algorithm. Vol. 24. #12. 3801. https://doi.org/10.3390/s24123801.
15. Aulin V.V., Kovalov S.G., Hrynkiv A.V., Varvarov V.V. (2024). Increasing the reliability and efficiency of production lines using artificial intelligence methods with monitoring of acoustic signals. Central Ukrainian Scientific Bulletin. Technical Sciences: Collection of Scientific Works. Kropyvnytskyi: TsUNTU 10(41). https://doi.org/10.32515/2664-262X.2024.10(41).2.142-151 [in Ukrainian].
16. Aulin V.V., Kovalov S.G., Hrynkiv A.V., Varvarov V.V. (2024) Algorithm for optimizing the reliability and efficiency of production equipment using artificial intelligence methods. Central Ukrainian Scientific Bulletin. Technical Sciences. Kropyvnytskyi: TsUNTU. 10(41). https://doi.org/10.32515/2664-262X.2024.10(41).1.60-67 [in Ukrainian].
17. Kovalov, S.G. (2025.) Optimization of production time using reinforcement learning as a special case of increasing the efficiency of automated production lines. Central Ukrainian Scientific Bulletin. Technical Sciences: Collection of Scientific Works. Kropyvnytskyi: TsNTU 11(42). https://doi.org/10.32515/2664-262X.2025.11(42).1.198-205.[in Ukrainian].
18. Kovalov S.G., Kovalov Y.G. (2024). Features of the implementation of artificial neural network models using hardware solutions. "Science and Technology Today". No. 6(34). https://doi.org/10.52058/2786-6025-2024-6(34) [in Ukrainian].
Citations
1. Методологічні основи проектування та функціонування інтелектуальних транспортних систем та виробничих систем / В.В. Аулін та ін,; за рад. В.В. Аулін. Кропивницький, 2020. 451с.
2. Neves M., Vieira V., Neto P. A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems. 2021. № 59, P. 426–440. URL: https://doi.org/10.1016/j.jmsy.2021.02.014.
3. Zhao, M., Lu, H., Yang, S., Guo, F. The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access 2020, №8, P.47824–47844. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9022975
4. Palacio J.C., Jiménez Y.M., Schietgat, L., Van Doninck B. , Nowé, A. A. Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP. 2022, №106, P 227–232. URL: https://doi.org/10.1016/j.procir.2022.02.183
5. Ha D. Reinforcement learning for improving agent design. Artificial life. 2019. Vol. 25, No. 4. P. 352–365. URL: https://doi.org/10.1162/artl_a_00301.
6. Han R., Chen K., Tan C. Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning. British journal of mathematical and statistical psychology. 2020. Vol. 73, No. 3. P. 522–540. URL: https://doi.org/10.1111/bmsp.12199.
7. Sun S. et al. A Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local. Advanced robotics. 2020. Vol. 34, No. 13. P. 888–901. URL: https://doi.org/10.1080/01691864.2020.1753569.
8. L.A.P., Fu M.C. Risk-Sensitive reinforcement learning via policy gradient search. Foundations and trends® in machine learning. 2022. Vol. 15, No. 5. P. 537–693. URL: https://doi.org/10.1561/2200000091.
9. He S. Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural computing and applications. 2019. Vol. 32, No. 18. P. 14311–14320. URL: https://doi.org/10.1007/s00521-019-04180-2.
10. Moore B. L. and others. Reinforcement learning. Anesthesia & analgesia. 2011. Vol. 112, No. 2. P. 360–367. URL: https://doi.org/10.1213/ane.0b013e31820334a7.
11. Yan Y. et al Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities . Transportation research part E: logistics and transportation review. 2022. Vol. 162. P. 102712. URL: https://doi.org/10.1016/j.tre.(2022).
12. Wu Y. et al. Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning . EURASIP journal on wireless communications and networking. 2021. Vol. 2021, No. 1. URL: https://doi.org/10.1186/s13638-021-01939-x.
13. Jesus J. C. et al. Deep deterministic policy gradient for navigation of mobile robots in simulated environments. 2019 19th international conference on advanced robotics (ICAR), December 2–6. 2019, Belo Horizonte, Brazil. P. 349 – 361. URL: https://doi.org/10.1109/icar46387.2019.8981638
14. Hu H, Yang M, Yuan Q, You M,, Shi X, Sun Y Sensors Direct Position Determination of Non-Gaussian Sources for Multiple Nested Arrays. Discrete Fourier Transform and Taylor Compensation Algorithm. 2024, Vol. 24. №12. 3801. https://doi.org/10.3390/s24123801.
15. Аулін В.В., Ковальов С.Г., Гриньків А.В., Варваров В.В. Підвищення надійності та ефективності виробничих ліній з використанням методів штучного інтелекту з моніторингом акустичних сигналів. Центральноукраїнський науковий вісник. Технічні науки: Збірник наукових праць - Кропивницький: ЦУНТУ. 2024. Вип.10(41). Ч. 2. - Стор. 142-151. URL: https://doi.org/10.32515/2664-262X.2024.10(41).2.142-151
16. Аулін В.В., Ковальов С.Г., Гриньків А.В., Варваров В.В. Алгоритм для оптимізації надійності та ефективності виробничого обладнання з використанням методів штучного інтелекту. Центральноукраїнський науковий вісник. Технічні науки. Кропивницький: ЦУНТУ, 2024. Вип. 10(41). Ч. 1. С. 60-67. URL: https://doi.org/10.32515/2664-262X.2024.10(41).1.60-67
17. Ковальов, С.Г. Оптимізація виробничого часу з використанням навчання з підкріпленням як окремий випадок підвищення ефективності автоматизованих виробничих ліній. Центральноукраїнський науковий вісник. Технічні науки: Збірник наукових праць – Кропивницький: ЦУНТУ, 2025. Вип. 11(42). Ч. 1. С. 198-205. URL: https://doi.org/10.32515/2664-262X.2025.11(42).1.198-205
18. Ковальов С.Г., Ковальов Ю.Г. Особливості реалізації моделей штучних нейронних мереж з використанням апаратних рішень. "Наука і технології сьогодні". 2024. №6(34), С. 1131. URL: DOI: https://doi.org/10.52058/2786-6025-2024-6(34)