DOI: https://doi.org/10.32515/2664-262X.2025.11(42).1.198-205

Optimization of Production Time using the Reinforcement Learning Method as a Particular Case Of Improving the Efficiency of Automated Production Lines

Serhii Kovalov

About the Authors

Serhii Kovalov, PhD in Pedagogicals (Candidate of Pedagogical Sciences), Central Ukrainian National Technical University, Kropyvnytskyi, Ukraine, e-mail: kovalyovserggr@ukr.net, ORCID ID: 0009-0002-3922-8697

Abstract

This article examines the application of reinforcement learning methods to optimize the production time of an automated production line modeled as a graph. In this graph representation, nodes correspond to pieces of equipment capable of performing one or multiple manufacturing operations. Such an approach not only creates a precise model of the agent's operating environment but also enables the implementation of computer simulations. These simulations serve as a critical foundation for assessing the potential effectiveness of reinforcement learning methods in optimizing real-world production lines. By adopting this approach, the study explores opportunities to improve efficiency, optimize resource utilization, and enhance the reliability of production systems. A key focus of the article is the detailed investigation of the stages involved in the computer simulation of production time optimization. The simulation process consisted of several integral stages: preparation of input data, design and implementation of the simulation environment, construction of a Deep Q-Network (DQN) agent, execution of the learning algorithm, and evaluation of optimization efficiency. These stages are thoroughly analyzed, demonstrating the systematic approach required to integrate reinforcement learning into manufacturing processes. The research also emphasizes the advantages of modeling the production line as a graph, highlighting how it enables the simulation of dynamic and complex production environments. This graph-based framework provides the agent with a structured understanding of equipment connectivity and operational constraints, allowing it to develop effective decision-making policies. Through iterative interactions with the environment, the DQN agent identifies optimal production sequences, minimizes downtime, and enhances throughput. Furthermore, the article explores the practical implications of integrating reinforcement learning into industrial applications. Computer simulations not only validate the feasibility of these methods but also provide insights into their scalability and adaptability to diverse manufacturing scenarios. The findings underscore the potential of reinforcement learning to transform automated production lines into more intelligent, adaptive, and resilient systems. By addressing both theoretical and practical aspects, the study lays the groundwork for future research in applying artificial intelligence to industrial automation. This comprehensive approach enables stakeholders to better understand the value of advanced learning algorithms in boosting operational efficiency and ensuring sustainable growth in automated production.

Keywords

production lines, optimization of utilization efficiency, artificial intelligence, production time, modeling the line as a system, state graphs

Full Text:

PDF

References

1. Neves, M., Vieira, M., & Neto, P. (2021). A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems, 59, 426–440.

2. Aulin, V. V., Hrynkiv, A. V., Lysenko, S. V., & Holub, D. V. (2019). Synergetics of improving machine reliability using Markov process models. In Proceedings of the V All-Ukrainian Scientific-Practical Conference "Perspectives and Trends in the Development of Structures and Technical Service of Agricultural Machines and Tools" (pp. 242–245). Zhytomyr Agricultural Technical College [in Ukrainian].

3. Aulin, V. V., Hrynkiv, A. V., Holovatyi, A. O., Lysenko, S. V., Holub, D. V., Kuzyk, O. V., & Tykhyi, A. A. (2020). Methodological foundations of design and operation of intelligent transportation and manufacturing systems. Lysenko V.F. [in Ukrainian].

4. Zhao, M., Lu, H., Yang, S., & Guo, F. (2020). The experience-memory Q-learning algorithm for robot path planning in unknown environment. IEEE Access, 8, 47824–47844. doi.org/10.1109/ACCESS.2020.2978978.

5. Palacio, J. C., Jiménez, Y. M., Schietgat, L., Van Doninck, B., & Nowé, A. (2022). A Q-learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP, 106, 227–232.

6. Ha, D. (2019). Reinforcement learning for improving agent design. Artificial Life, 25(4), 352–365. https://doi.org/10.1162/artl_a_00301.

7. Han, R., Chen, K., & Tan, C. (2020). Curiosity‐driven recommendation strategy for adaptive learning via deep reinforcement learning. British Journal of Mathematical and Statistical Psychology, 73(3), 522–540. https://doi.org/10.1111/bmsp.12199.

8. Sun, S., et al. (2020). Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local vision. Advanced Robotics, 34(13), 888–901. doi.org/10.1080/01691864.2020.1753569.

9. L. A. P., & Fu, M. C. (2022). Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends® in Machine Learning, 15(5), 537–693. https://doi.org/10.1561/2200000091.

10. He, S., et al. (2019). Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Computing and Applications, 32(18), 14311–14320. https://doi.org/10.1007/s00521-019-04180-2.

11. Moore, B. L., et al. (2011). Reinforcement learning. Anesthesia & Analgesia, 112(2), 360–367. https://doi.org/10.1213/ane.0b013e31820334a7

12. Yan, Y., et al. (2022). Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities. Transportation Research Part E: Logistics and Transportation Review, 162, 102712. https://doi.org/10.1016/j.tre.2022.102712.

13. Wu, Y., et al. (2021). Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning. EURASIP Journal on Wireless Communications and Networking, 2021(1). doi.org/10.1186/s13638-021-01939-x.

14. Jesus, J. C., et al. (2019). Deep deterministic policy gradient for navigation of mobile robots in simulated environments. In Proceedings of the 2019 19th International Conference on Advanced Robotics (ICAR), Belo Horizonte, Brazil. IEEE. https://doi.org/10.1109/icar46387.2019.8981638.

15. Kovalev, S. G., & Kovalev, Yu. G. (2024). Features of implementing an artificial neural network model using hardware. Nauka i Tekhnika Sohodni, (6(34)), 11–31. doi.org/10.52058/2786-6025-2024-6(34) [in Ukrainian].

Citations

1. Neves M., Vieira M., Neto P. A study on a Q-Learning algorithm application to a manufacturing assembly problem. Journal of Manufacturing Systems. Issue 59. 2021. P. 426–440.

2. Аулін В.В., Гриньків А.В., Лисенко С.В., Голуб Д.В. Синергетика підвищення надійності машин використанням моделей марківських процесів. Перспективи і тенденції розвитку конструкцій та технічного сервісу сх машин і знарядь: зб. матеріалів доп. учасн. V Всеукраїнської науково-практичної конф. Житомир: Житомирський агротехнічний коледж, 2019. С. 242-245.

3. Аулін В. В., Гриньків А. В., Головатий А. О., Лисенко С. В., Голуб Д. В., Кузик О.В., Тихий А. А. Методологічні основи проектування та функціонування інтелектуальних транспортних і виробничих систем: монографія під заг. ред. д.т.н., проф. Ауліна В.В. Кропивницький: Видавець Лисенко В.Ф., 2020. 428с.

4. Zhao, M.; Lu, H.; Yang, S.; Guo, F. The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access 2020, 8, 47824–47844.

5. Palacio, J.C.; Jiménez, Y.M.; Schietgat, L.; Van Doninck, B.; Nowé, A. A Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP 2022, 106, 227–232.

6. Ha D. Reinforcement learning for improving agent design. Artificial life. 2019. Т. 25, № 4. С. 352–365. URL: https://doi.org/10.1162/artl_a_00301.

7. Han R., Chen K., Tan C. Curiosity‐driven recommendation strategy for adaptive learning via deep reinforcement learning. British journal of mathematical and statistical psychology. 2020. Т. 73, № 3. С. 522–540. URL: https://doi.org/10.1111/bmsp.12199.

8. Inverse reinforcement learning-based time-dependent A* planner for human-aware robot navigation with local vision / S. Sun та ін. Advanced robotics. 2020. Т. 34, № 13. С. 888–901. URL: https://doi.org/10.1080/01691864.2020.1753569.

9. L. A. P., Fu M. C. Risk-Sensitive reinforcement learning via policy gradient search. Foundations and trends® in machine learning. 2022. Т. 15, № 5. С. 537–693. URL: https://doi.org/10.1561/2200000091.

10. Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information / S. He та ін. Neural computing and applications. 2019. Т. 32, № 18. С. 14311–14320. URL: https://doi.org/10.1007/s00521-019-04180-2.

11. Reinforcement learning / B. L. Moore та ін. Anesthesia & analgesia. 2011. Т. 112, № 2. С. 360–367. URL: https://doi.org/10.1213/ane.0b013e31820334a7.

12. Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities / Y. Yan та ін. Transportation research part E: logistics and transportation review. 2022. Т. 162. С. 102712. URL: https://doi.org/10.1016/j.tre.2022.102712.

13. Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning / Y. Wu та ін. EURASIP journal on wireless communications and networking. 2021. Т. 2021, № 1. URL: https://doi.org/10.1186/s13638-021- 01939-x.

14. Deep deterministic policy gradient for navigation of mobile robots in simulated environments / J. C. Jesus та ін. 2019 19th international conference on advanced robotics (ICAR), м. Belo Horizonte, Brazil, 2–6 груд. 2019 р. 2019. URL: https://doi.org/10.1109/icar46387.2019.8981638 .

15. Ковальов С.Г. Ковальов Ю.Г. Особливості реалізація моделі штучної нейронної мережі апаратними засобами. «Наука і технікасьогодні» (Серія «Педагогіка», Серія «Право», Серія «Економіка», Серія «Фізико-математичнінауки», Серія «Техніка»)»: журнал. 2024. No6(34) 2024. С. 1131. URL: DOI: https://doi.org/10.52058/2786-6025-2024-6(34).

Copyright (c) 2025 Serhii Kovalov