61b A New Approach for the Solution of Noisy Black-Box Models Involving Integer Variables

Eddie Davis and Marianthi Ierapetritou. Chemical & Biochemical Engineering, Rutgers University, 98 Brett Road, Piscataway, NJ 08854

For the cases where accurate closed-form descriptions of chemical processes are unavailable, gradient-based optimization methods can fail due to the lack of reliable derivative information for these black-box systems. To overcome this problem, direct search techniques can be used, but convergence to an optimum can be slow. On the other hand, while convergence can be accelerated through information obtained using a model, misleading search directions may be identified and premature termination can occur, as can happen with noisy input-output data. Furthermore, the value of the information obtained must not be outweighed by the cost of model construction for the nonlinear program (NLP). This last consideration becomes particularly important if integer variables are also present, as the solution to many relaxed NLP subproblems may be required. In this work, we use Branch-and-Bound (B&B) to solve the MINLP. At each node of the B&B tree, the solution of a relaxed NLP is an upper or lower bound (UB/LB) depending upon 0-1 variable feasibility. Using B&B, we establish a converging sequence of UB/LB to find the optimum solution. If the relaxed NLP at any node is nonconvex leading to a local optimal solution the bound updating is weaker, thereby prolonging search or even cutting off the optimal solution.

Since local methods ensure a global optimal solution only under convexity conditions, which cannot be known a priori for these black-box systems, these approaches will be inefficient for solving this class of problems. Let x, y, and z denote the vectors of continuous, binary, and output variables, respectively, and let s represent the system noise. We divide the general, noisy black-box MINLP into subclass SP1 if the noisy outputs depend solely upon the continuous variables, and SP2 if the noisy outputs are functions of binary variables as well. To further illustrate the difference between SP1 and SP2, consider the problem of maximizing the distillate purity for a benzene-toluene separation, given the choice of multiple columns. Examples of continuous variables might be the total or component feed rate, reflux ratio, heat duty, etc., whereas possible binary variables are the number of internal/feed trays, or feed tray location. In the SP1 class of problems, the binary variable representing the choice of distillation column appears outside of the black-box part of the problem whereas if these design variables are part of the black-box model, this problem is classified as SP2 and the input-output model must be built over the dual continuous-binary variable space. In this work, we propose a new methodology to more efficiently solve the SP1 type problems of noisy, black-box MINLP problems. A similar approach for SP2 is the subject of a future work.

A comprehensive review of global optimization is covered in Floudas et al (2005). Examples of zero-order methods for optimizing black-box functions are DIRECT (Pertunnen et. al 1993), MCS (Huyer & Neumaier, 1999), DE (Storn & Price, 1995), and pattern search (Audet & Dennis, 2004). To avoid the possible asymptotic convergence exhibited by the above algorithms, local metamodels known as response surfaces, which capture the underlying deterministic system behavior (Myers & Montgomery, 2002), can instead be optimized. Response surfaces are polynomials, usually quadratic, whose basis functions are weighted by regression coefficients. Predictions utilizing radial basis functions have also been studied (Gutmann, 2001, Regis & Shoemaker, 2005) as they compensate for the inability of a quadratic response surface to sufficiently model highly nonlinear functions. However, the advantage of quadratic approximations is that derivative information is obtained easily, allowing gradient-based methods to be applied to sequential response surfaces in order to find the optimum. Model reliability increases as more sampling points are used, although the best search directions are not necessarily required until the neighborhood of an optimum has been found. This observation is exploited in two response surface methods (RSM) we developed to reduce the number of sampling points required (Davis et. al, 2005). More recently, we proposed using these algorithms within a B&B framework to solve MINLP with noisy and black-box functions (Davis & Ierapetritou, 2006). However, the limitation of using our RSM algorithms is that we cannot guarantee our solution is a global optimum if the underlying, deterministic relaxed NLP is nonconvex. If we terminate further search of the relaxed NLP at this node, and a better solution exists, we will have missed the opportunity to achieve a stronger updating of the corresponding LB/UB. Path-following methods, such as those applied by Lucia et. al (2004) can be used to find the additional local minima in order to ensure the relaxed NLP solution is indeed the global optimum. However, the computational expense increases because additional sampling points are required.

Given a poor initial sampling set, a large number of experiments may be required in order to discover even a local optimum. Furthermore, resource restrictions may prevent the additional sampling needed in order to obtain the global one as well. Since the performance of local search methods improves in the vicinity of an optimum, our proposed strategy applies global search using a kriging predictor during the initial stages of optimization, followed by local search using our previously developed response surface methods once promising regions have been identified.

In contrast to the deterministic, sequential local modeling used by response surfaces methods, kriging is a global modeling technique whose predictor approximates both deterministic and stochastic system components (Sasena, 2002). Sampling points are dispersed throughout the feasible region and are not required to conform to a standard experimental design, which is problematic for nonconvex feasible regions. Predictions at different than the sampled points are obtained by indirectly applying a regression function constructed from all the sampling data, although generally only the nearest neighbors make significant contributions. The kriging model is a best linear unbiased predictor, meaning that the expected variance between a model of the output and that of the predictor is minimized. This term is available in a closed form and an uncertainty map can be superimposed upon the predictor mapping, indicating where further search should be conducted. Among others, Jones et. al (1998) developed kriging algorithms that have been successfully applied in NLP optimization.

The main advantage of applying kriging over RSM for noisy, black-box systems is that we obtain a better estimate of where the global optimum resides, since the kriging predictor is a global model, whereas response surfaces are local. We can also increase confidence that the solution is the global optimum by conducting additional sampling until the expected improvement falls below some tolerance. In employing the kriging predictor as a stochastic global optimizer, we avoid the problems associated with local convergence. However, the main drawbacks with building the kriging predictor are that, 1) a significant number of sampling points are required, and 2) model updating is expensive. Gradient-based methods, such as the previously developed RSM algorithms (Davis et al. 2005), can efficiently find the global optimum once its neighborhood has been identified since quadratic models accurately represent the system curvature behavior near an optimum.

In our proposed algorithm for solving SP1-type problems, we build a kriging predictor modeling z(x,s) at the root node of the B&B tree. Note that the input-output mapping of the relaxed NLP at all subsequent nodes projects this model over the process constraints g(x,y) once 0-1 assignments have been made. At each node, we obtain the solution of the unconstrained relaxed NLP to obtain the optimal z. We then map the kriging predictor over the given feasible region to find promising subregions containing predictions close to this optimal value. Within these neighborhoods, we apply our RSM algorithms to refine the solution. Depending on solution feasibility in the 0-1 variables, the corresponding UB/LB is updated if its value improves upon the current one. Node selection rules are applied in selecting the next relaxed NLP to solve, with algorithm termination occurring if the list of candidate subproblems is exhausted, or the integrality gap between the UB and LB is satisfied. Several numerical examples are presented to illustrate the basic ideas of the proposed approach.

Literature Cited Audet, C. and J. Dennis Jr., 2004, SIAM J. Opt., 14, 980. Davis, E. and M. Ierapetritou, Proceedings of ESCAPE-16/PSE, 2006. Davis, E., A. Bindal and M. Ierapetritou, 2005, To appear Comp. Chem. Eng. Floudas, C. et al., 2005, Comp. & Chem. Eng., 29, 1185. Gutmann, H. M., 2001b, J. Global Opt., 19, 201. Huyer, W. and A. Neumaier, 1999, J. Global Opt., 14, 331. Jones, D., M. Schonlau and W. Welch, 1998, J. Global Opt., 13, 455. Lucia, A., P. DiMaggio, and P. Depa, 2004, J. Global Opt., 29, 297. Myers, R. and D. Montgomery, 2002, Response Surface Methodology. Pertunnen, C., D. Jones, and B. Stuckman, 1993, J. Opt. Th. App., 79, 157. Regis, R. and C. Shoemaker, 2005, J. Global Opt., 31, 153. Sasena, M. 2002, Ph.D. Thesis, Univ. of Michigan. Storn, R. and K. Price, 1995, Technical Report, International Computer Science Institute.