Using the proximal policy optimisation algorithm for solving the stochastic capacitated lot sizing problem
Joint work with Lotte van Hezewijk, Nico Dellaert, and Noud Gademann, is now published in the International Journal of Production Research. The published version is available here (open access).
This paper applies Deep Reinforcement Learning (DRL) to the stochastic capacitated lot-sizing problem (S-CLSP) with stationary demand to specify near-optimal replenishment and production policies. We consider a problem with multiple products, limited capacity, set-up times, and stochastic demand that is fully backordered in case of shortage. The costs that are relevant in this case are set-up costs, holding costs, and backorder costs. We model the problem as a Markov Decision Process (MDP) and solve it using the Proximal Policy Optimisation (PPO) method, a DRL type. Solving the S-CLSP and applying this general-purpose method to practical problems comes with three main challenges. Addressing these challenges is the main contribution of this paper.
Utilizing the increased availability of data, machine learning and data analytics techniques advanced enormously. Reinforcement learning offers many opportunities for solving complex sequential decision-making problems. DRL methods are often criticized as ‘black box’ methods because of the use of Deep Neural Networks, leading to a lack of insights into the resulting solution. This lack of understanding may limit the applicability in practice. To overcome this hurdle, we illustrate how this solution’s outcomes can be interpreted such that the resulting policy can be explained.