Badgwell, T.A., Liu, K.H., Subrahmanya, N.A., Kovalski, M.H., et al. (2019). Adaptive pid controller tuning via deep reinforcement learning. US Patent App. 16/218,650. 
Bellman, R. (1957). A markovian decision process. Journal of mathematics and mechanics, 679–684.
Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37. 
Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P., and Bertsekas, D.P. (1995). Dynamic programming and optimal control, volume 1. Athena scientiﬁc Belmont, MA. 
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., and Now´e, A. (2015). Reinforcement learning from demonstration through shaping. In TwentyFourth International Joint Conference on Artiﬁcial Intelligence. 
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Oﬀ-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290. 
Heess, N., Hunt, J.J., Lillicrap, T.P., and Silver, D. (2015). Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455. 
Li, H., Collins, C.R., Ribelli, T.G., Matyjaszewski, K., Gordon, G.J., Kowalewski, T., and Yaron, D.J. (2018). Tuning the molecular weight distribution from atom transfer radical polymerization using deep reinforcement learning. Molecular Systems Design & Engineering, 3(3), 496–508. 
Li, K. and Malik, J. (2017). Learning to optimize neural nets. arXiv preprint arXiv:1703.00441. 
Ma, Y., Noren˜a-Caro, D.A., Adams, A.J., Brentzel, T.B., Romagnoli, J.A., and Benton, M.G. (2020). Machinelearning-based simulation and fed-batch control of cyanobacterial-phycocyanin production in plectonema by artiﬁcial neural network and deep reinforcement learning. Computers & Chemical Engineering, 142, 107016. 
Ma, Y., Zhu, W., Benton, M.G., and Romagnoli, J. (2019). Continuous control of a polymerization system with deep reinforcement learning. Journal of Process Control, 75, 40–47. 
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928– 1937. 
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Nie, Y., Biegler, L.T., Villa, C.M., and Wassick, J.M. (2013). Reactor modeling and recipe optimization of polyether polyol processes: Polypropylene glycol. AIChE Journal, 59(7), 2515–2529. 
Pan, X., You, Y., Wang, Z., and Lu, C. (2017). Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952. 
Pathak, D., Agrawal, P., Efros, A.A., and Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 16–17. 
Peng, X.B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2018). Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA), 1–8. IEEE.
Petsagkourakis, P., Sandoval, I.O., Bradford, E., Zhang, D., and del Rio-Chanona, E.A. (2020). Reinforcement learning for batch bioprocess optimization. Computers & Chemical Engineering, 133, 106649. 
Russo, D. and Van Roy, B. (2014). Learning to optimize via information-directed sampling. In Advances in Neural Information Processing Systems, 1583–1591. 
Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P.D., and Bhushan Gopaluni, R. (2019). Toward self-driving processes: A deep reinforcement learning approach to control. AIChE Journal, 65(10), e16689. 
Sutton, R.S. (1985). Temporal Credit Assignment in Reinforcement Learning. Ph.D. thesis. 
Sutton, R.S. (1988). Learning to predict by the methods of temporal diﬀerences. Machine learning, 3(1), 9–44. 
Sutton, R.S. and Barto, A.G. (2018). Reinforcement learning: An introduction. MIT press. 
Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, 1057–1063. 
Williams, R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4), 229–256. 
Zhou, Z., Li, X., and Zare, R.N. (2017). Optimizing chemical reactions with deep reinforcement learning. ACS central science, 3(12), 1337–1344.