Achiam, J., Held, D., Tamar, A., and Abbeel, P. (2017). Constrained policy optimization. In D. Precup and Y.W. Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 22{31. PMLR, International Convention Centre, Sydney, Australia. URL http://proceedings.mlr.press/v70/achiam17a.html. Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34{37. Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., and Ghavamzadeh, M. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031. Deb, K. (2000). An ecient constraint handling method for genetic algorithms. Computer methods in applied mechanics and engineering, 186(2-4), 311{338. Grune, L. and Pannek, J. (2017). Nonlinear model predictive control. In Nonlinear Model Predictive Control, 45{69. Springer. Joines, J.A. and Houck, C.R. (1994). On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with ga's. In Proceedings of the rst IEEE conference on evolutionary compu- tation. IEEE world congress on computational intelli- gence, 579{584. IEEE. Kramer, O. (2010). A review of constraint-handling techniques for evolution strategies. Applied Computational Intelligence and Soft Computing, 2010. Kreisselmeier, G. and Steinhauser, R. (1980). Systematic control design by optimizing a vector performance index. In Computer aided design of control systems, 113{ 117. Elsevier. Lin, W.S. and Zheng, C.H. (2012). Constrained adaptive optimal control using a reinforcement learning agent. Automatica, 48(10), 2614{2619. Ma, Y., Zhu, W., Benton, M.G., and Romagnoli, J. (2019). Continuous control of a polymerization system with deep reinforcement learning. Journal of Process Control, 75, 40{47. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Modares, H., Nageshrao, S.P., Lopes, G.A.D., Babuska, R., and Lewis, F.L. (2016). Optimal model-free output synchronization of heterogeneous systems using o -policy reinforcement learning. Automatica, 71, 334{341. Pan, A., Xu, W., Wang, L., and Ren, H. (2020). Additional planning with multiple objectives for reinforcement learning. Knowledge-Based Systems, 193, 105392. Petsagkourakis, P., Sandoval, I.O., Bradford, E., Zhang, D., and Chanona, E.A.d.R. (2020a). Constrained reinforcement learning for dynamic optimization under uncertainty. arXiv preprint arXiv:2006.02750. Petsagkourakis, P., Sandoval, I.O., Bradford, E., Zhang, D., and del Rio-Chanona, E.A. (2020b). Reinforcement learning for batch bioprocess optimization. Computers & Chemical Engineering, 133, 106649. Poon, N.M. and Martins, J.R. (2007). An adaptive approach to constraint aggregation using adjoint sensitivity analysis. Structural and Multidisciplinary Optimiza- tion, 34(1), 61{73. Puterman, M.L. (2014). Markov decision processes: dis- crete stochastic dynamic programming. John Wiley & Sons. Shin, J., Badgwell, T.A., Liu, K.H., and Lee, J.H. (2019). Reinforcement learning{overview of recent progress and implications for process control. Computers & Chemical Engineering, 127, 282{294. Yang, T., Zhao, L., Li, W., and Zomaya, A.Y. (2020). Reinforcement learning in sustainable energy and electric systems: A survey. Annual Reviews in Control. Yoo, H., Kim, B., Kim, J.W., and Lee, J.H. (2020). Reinforcement learning based optimal control of batch processes using monte-carlo deep deterministic policy gradient with phase segmentation. Computers & Chem- ical Engineering, 107133. Zhang, P., Li, H., Ha, Q., Yin, Z.Y., and Chen, R.P. (2020). Reinforcement learning based optimizer for improvement of predicting tunneling-induced ground responses. Advanced Engineering Informatics, 45, 101097.