Convex online optimization for dynamic systems control
Led by: | Prof. Dr.-Ing. Matthias Müller |
Team: | Marko Nonhoff |
Year: | 2023 |
Funding: | Deutsche Forschungsgemeinschaft (DFG) - 505182457 |
Duration: | 2023 - 2025 |
Project Summary
Online convex optimization (OCO) was originally developed in the field of machine learning for online optimization of a priori unknown and time-varying cost functions. In this method, an action is selected at each time step, with only prior cost functions and actions known. Only after an action is selected the current cost function is revealed by the environment. The goal of the OCO strategy is to minimize cumulative costs over a finite time horizon. Advantages of this method include its applicability to a priori unknown and time-varying costs (such as time-varying energy prices), and its ability to incorporate known bounds on the allowable actions. In addition, OCO algorithms are generally computationally efficient because they do not explicitly solve the optimization problem. Instead, an iteration of an appropriate optimization procedure (e.g., gradient descent) is applied at each time step. These advantages of OCO are highly desirable for the control of dynamic systems, which is why our (and other) research groups have recently developed the first control methods based on OCO. However, performance and stability guarantees for these algorithms require restrictive assumptions and limit their applicability to stabilizing a priori unknown and time-varying setpoints or neglect constraints on the manipulated variable or system states.
The main goal of this project is to develop OCO-based control methods for general cost functions and constraints without restrictive assumptions. For this purpose, we will analyze methods that directly apply OCO to control dynamic systems, as well as combine OCO with established control methods, such as reference governors or model predictive control (MPC). We will investigate the specific advantages of these methods as well as the resulting closed-loop properties. Furthermore, recently gained insights from the OCO context will be used to analyze the ''regret" (a measure of achieved performance typically used in OCO) of general MPC algorithms in the context of a priori unknown and time-varying cost functions.