IOL Seminar and Lecture Series
The IOL Seminar and Lecture Series at the Zuse Institute Berlin serves to bring together researchers presenting their latest work and to organize tutorial lectures on valuable topics not typically covered in graduate coursework. Presentations usually take place on Wednesday afternoons in ZIB’s Seminar Room 2006. Click here for the official website.
Upcoming Talks
Differential privacy is a concept that can be used to express the extent to which algorithms like database queries, statistics and machine learning procedures, preserve a certain notion of privacy of an input dataset. Prominent applications of the technique include the US census and user data aggregation procedures of multiple large tech companies. The correct implementation of such algorithms requires a substantial amount of care, which motivated the development of type systems tailored to verify the differential privacy properties of programs. We implemented a type checker for one such type system and integrated it with the julia programming language to enable not only verification but automatic inference of privacy parameters for a reasonable subset of julia code.
For almost two decades, mixed integer programming (MIP) solvers have used graph-based conflict analysis to learn from local infeasibilities during branch-and-bound search. In this talk, we discuss improvements for MIP conflict analysis by instead using reasoning based on cuts, inspired by the development of conflict-driven solvers for pseudo-Boolean optimization. Phrased in MIP terminology, this type of conflict analysis can be understood as a sequence of linear combinations, integer roundings, and cut generation. We leverage this MIP perspective to design a new conflict analysis algorithm based on mixed integer rounding cuts, which theoretically dominates the state-of-the-art method in pseudo-Boolean optimization using Chvátal-Gomory cuts. Furthermore, we discuss how to extend this cut-based conflict analysis from pure binary programs to mixed binary programs and-in limited form-to general MIP with also integer-valued variables. Our experimental results indicate that the new algorithm improves the default performance of SCIP in terms of running time, number of nodes in the search tree, and the number of instances solved.
We present a framework that transforms geometric and combinatorial problems into optimization tasks by designing loss functions that vanish precisely when the desired coloring properties are achieved. We employ neural networks trained through gradient descent to minimize these loss functions, allowing for efficient exploration of the solution space. We demonstrate the effectiveness of the method on variants of the Hadwiger-Nelson problem, which asks for plane colorings that avoid monochromatic unit-distance pairs and sketch how the approach can be applied to other problems.
Past Talks
We study the pure Price of Anarchy (PoA) of symmetric network congestion games defined over series-parallel networks, which measures the inefficiency of pure Nash equilibria.
First, we consider affine edge delays. For arbitrary networks, Correa and others proved a tight upper bound of 5/2 on the PoA. On the other hand, Fotakis showed that restricting to the class of extension-parallel networks makes the worst-case PoA decreases to 4/3. We prove that, for the larger class of series-parallel networks, the PoA is at most 2, and that it is at least 27/19 in the worst case, improving both the best-known upper bound and the best-known lower bound.
Next, we consider edge delays that are polynomial functions with highest degree p. We construct a family of symmetric congestion games over arbitrary networks which achieves the same worst-case PoA of asymmetric network congestion games given by Aland and others. We then establish that in games defined over series-parallel networks the PoA cannot exceed 2p+1 - 1, which is considerably smaller than the worst-case PoA in arbitrary networks. We also prove that the worst-case PoA, which is sublinear in extension-parallel networks (as shown by Fotakis), dramatically degrades to exponential in series-parallel networks.
Finally, we extend the above results to the case where the social cost of a strategy profile is computed as the maximum players’ cost. In this case, the worst-case PoA is in O(4p), which is considerably smaller than the worst-case PoA in arbitrary networks given by Christodoulou and Koutsoupias. Moreover, while in extension-parallel networks each pure Nash equilibrium is also a social optimum (as shown by Epstein and others), we construct instances of series-parallel network congestion games with exponential PoA.
Sequential decision-making often requires dynamic policies, which are computationally not tractable in general. Decision rules provide approximate solutions by restricting decisions to simple functions of uncertainties. In this talk, we consider a nonparametric lifting framework where the uncertainty space is lifted to higher dimensions to obtain nonlinear decision rules. Current lifting-based approaches require pre-determined functions and are parametric. We propose two nonparametric liftings, which derive the nonlinear functions by leveraging the uncertainty set structure and problem coefficients. Both methods integrate the benefits from lifting and nonparametric approaches, and hence, provide scalable decision rules with performance bounds. More specifically, the set-driven lifting is constructed by finding polyhedrons within uncertainty sets, inducing piecewise-linear decision rules with performance bounds. The dynamics-driven lifting, on the other hand, is constructed by extracting geometric information and accounting for problem coefficients. This is achieved by using linear decision rules of the original problem, also enabling to quantify lower bounds of objective improvements over linear decision rules. Using numerical comparisons with competing methods, we demonstrate superior computational scalability and comparable performance in objectives. These observations are magnified in multistage problems with extended time horizons, suggesting practical applicability of the proposed nonparametric liftings in large-scale dynamic robust optimization. This is a joint work with Eojin Han.
We study network flow interdiction problems with nonlinear and nonconvex flow models. The resulting model is a max-min bilevel optimization problem in which the follower’s problem is nonlinear and nonconvex. In this game, the leader attacks a limited number of arcs with the goal to maximize the load shed and the follower aims at minimizing the load shed by solving a transport problem in the interdicted network. We develop an exact algorithm consisting of lower and upper bounding schemes that computes an optimal interdiction under the assumption that the interdicted network remains weakly connected. The main challenge consists of computing valid upper bounds for the maximal load shed, whereas lower bounds can directly be derived from the follower’s problem. To compute an upper bound, we propose solving a specific bilevel problem, which is derived from restricting the flexibility of the follower when adjusting the load flow. This bilevel problem still has a nonlinear and nonconvex follower’s problem, for which we then prove necessary and sufficient optimality conditions. Consequently, we obtain equivalent single-level reformulations of the specific bilevel model to compute upper bounds. Our numerical results show the applicability of this exact approach using the example of gas networks.
The national and transcontinental electricity grids of today are based on devices such as coal furnaces, steam turbines, copper and steel wires, electric transformers, and electromechanical power switches that have remained unchanged for 100 years. However imperceptibly, the components and operational management of this great machine, the grid, has began to change irreversibly. This is fortunate, as climate science tells us we must reduce CO2 emissions from the energy sector to zero by 2050 and to 50% of current levels by 2030 if we are to prevent dangerous climate changes in future world that is over 1.5 degree hotter that today. Now utility scale wind and solar PV farms as large as coal, gas and nuclear generators are being deployed more cheaply than it is possible to build and operate generators using older technologies. In some cases, even these new technologies can be cheaper that even merely the operating costs of older technologies. In addition, low cost rooftop solar PV has also enabled consumers to become self-suppliers and also contributors to the supply of energy for their neighbours. Moreover, the “dumb” grid of the past, is becoming “smarter”. This is enabled through a combination of ubiquitous low-cost telecommunication and programmable devices at the edge of the grid such as smart meters, smart PV inverters, smart air conditioners and home energy management systems. The final component is the electrification of the private transport system that will finally eliminate the need for fossil fuels. The implications of this are that it is now necessary to rapidly replan and reinvest in the energy system at rates and in ways that are unprecedented in industrial civilisations history. While the majority of hardware technology already exist, the missing piece of the puzzle are new computers science technologies, and particularly Optimisation, Machine Learning, Forecasting and Data analytics methods needed to plan and operate this rapidly transforming system.
In this talk I will describe a range of ways existing computer science tools in the Optimisation, AI, ML and other areas we and others are enhancing in order to better operate and plan the existing power system. I will focus on identifying emerging research opportunities in areas that are needed to complete the transformation to a cheaper, smarter and zero carbon energy system.
Cutting planes are a crucial tool in mixed-integer programming. When measuring the strength of a specific class of cutting planes, a key question is whether the corresponding cut closure is polyhedral, meaning that finitely many cuts are sufficient to obtain the closure. The vast amount of literature investigating polyhedrality for different kinds of cutting planes indicates that the field has not yet settled on a universal solution to this problem. In this talk, we will look at cutting plane generation through a unifying framework and establish conditions that ensure polyhedrality of cut closures. This enables us to prove polyhedrality for a broad range of cutting plane families more easily.
Over the past decade, an enormous amount of satellite data has become freely available, unlocking countless opportunities for researchers to address a wide range of monitoring challenges. These include tasks like greenhouse gas estimation, ground displacement detection, soil moisture analysis, and many more. Despite this abundance of data, significant challenges remain when it comes to developing truly operational satellite analytics products—especially when scaling to create diverse solutions for various use cases. In this talk, I will explore these challenges from both a research and commercial perspective, highlighting the practical hurdles faced in turning satellite data into actionable products. I will also discuss how recent advancements in geospatial foundation models are beginning to emerge as a versatile, “Swiss Army knife” solution for building scalable and efficient satellite analytics tools.
How can we trust the correctness of a learned model on a specific input of interest? This work introduces Self-Proving Models—models that can prove the correctness of their outputs via an interactive proof system. These models offer high-probability correctness for most outputs, with a verification algorithm that reliably detects any incorrect ones. The framework, rooted in theoretical guarantees, is demonstrated through experiments using transformers trained for arithmetic tasks such as computing the greatest common divisor (GCD).
Faster algorithms for optimization problems are among the main potential applications for future quantum computers. There has been interesting progress in this area in recent years, for instance improved quantum algorithms for gradient descent and for solving linear and semidefinite programs. In this talk I will survey what we know about quantum speed-ups both for discrete and for continuous optimization, with a bit more detail about two speed-ups I worked on recently: for regularized linear regression and for Principal Component Analysis. I’ll also discuss some issues with this line of work, in particular that quadratic or subquadratic quantum speed-ups will only kick in for very large instance sizes and that many of these algorithms require some kind of quantum RAM.
While weighted majority voting (WMV) is a popular method in ensemble learning for classification, more complex aggregation rules have recently been shown to have beneficial theoretical properties such as better convergence in the number of calls to a weak learning oracle and better PAC-bounds. After a brief discussion of the literature, this talk compares simple WMV to repeated WMV. In particular, a tight approximability analysis of 1) empirical error minimization, 2) regularization by sparsity and 3) sparsity maximization subject to zero training error is provided. It reveals that, under polynomial time transformations, the respective problems for simple WMV are equivalent to a hard version of Minimum Weighted Set Cover (MWSC) while they are equivalent to a simple version of MWSC for repeated WMV. This analysis of approximability implies two lower bounds on convergence in the number of calls to a weak learning oracle until perfect fit for 1) simple WMV and 2) arbitrary aggregation rules which emphasizes the worse scalability of simple WMV. These theoretical results are complemented by experiments on medium-sized data sets from LIBSVM showing that repeated WMV needs fewer oracle calls compared to AdaBoost and XGBoost while having even or better test accuracy on average.
Bell’s 1964 theorem established that the predictions of quantum theory cannot be explained by any local theory, making nonlocality a pivotal concept in the understanding of quantum mechanics. Quantifying the nonlocality of a quantum state in a given scenario is challenging, so we propose examining the amount of classical communication required to simulate the correlations obtained by a quantum state, and thereby indirectly characterizing its nonlocality. We explore Bell scenarios augmented with one or more bits of classical communication with Frank-Wolfe algorithms, and extend the BellPolytopes.jl Julia library to handle those scenarios. In this talk, I will introduce the theoretical framework, cover implementation details, and present some preliminary results.
The use of proof assistants has been on the rise these past few years thanks to their ability to detect small flaws in mathematical proofs. The integration of AI has also made the use of these kind of tools easier and it has opened a lot of possible advancements for the future. In this talk we will focus on the ITP Lean, with an emphasis on some results in the hypercube graph. The center of these results will be the Sensitivity Conjecture, which had already been formalized previously. Our additions to this problem are the formalization of two extremal examples to prove the tightness of the two inequalities in the conjecture. These new results involved a few challenges, such as having to implement a proof of the inclusion-exclusion principle. Additionally, they highlighted the importance of selecting a good way to express our statement, especially as a number of definitions had to be made and the different data representations came with different lemmas available.
This talk will begin with a brief introduction to classical Ramsey theory (whose main idea is that complete disorder is impossible) and a discussion of a few well-known results. No specialized background knowledge in combinatorics is required for this talk. Then joint work with Will Smith from University of South Carolina (formerly Davidson) on a Ramsey theory problem based on the integer grid will be presented. Specifically, the L-theorem (an easy corollary of the Gallai-Witt theorem) states that for any integer k, there exists some n such that a k-colored n by n grid must contain a monochromatic "L" (a series of points of the form (i, j), (i, j+t), and (i+t, j+t) for some positive integer t. In this talk, we will investigate the upper bound for the smallest integer n such that a 3-colored n by n integer grid is guaranteed to contain a monochromatic L. We use various methods, such as counting intervals on the main diagonal, to improve the upper bound from 2593 to 493. For the lower bound, we match the lower bound of 21 generated by Canacki et al. (2023) and discuss possible improvements.
Many recent advances in the field of AI, including tools such as ChatGPT, rely on the training of foundation models. Foundation models are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes, or with producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans’ expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But which humans get to provide the feedback or principles? And how is their potentially diverging input aggregated into consistent data about “collective” preferences or otherwise used to make collective choices about model behavior? In this talk, I argue that the field of social choice is well positioned to address these questions, and discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.
This talk introduces and illustrates the ITP Lean, that allows the user to write mathematical statements and their proofs in a way that can be mechanically checked for correctness by a computer. Lean has gained increased attention in the past few years, due to hosting formalization projects of two Fields medalists, and due to the rise of automated theorem proving via AI models as a field of research. In this talk, we will exemplify how Lean is used by discussing our Master thesis project and the experiences we gained from it. We will also survey some large and completed formalization projects and give an insight into existing AI models surrounding Lean.
The simplex method is a very efficient algorithm. In this talk we see a few of the state-of-the-art theories for explaining this observation. We will discuss what it takes for a mathematical model to explain an algorithm’s qualities, and whether existing theories meet this bar. Following this, we will question what the simplex method is and if the theoretician’s simplex method is the same algorithm as the practitioner’s simplex method.
In light of the ongoing developments in the climate crisis, it is necessary to consider factors beyond the sole economic perspective in energy supply network planning. This gives rise to a classical multiobjective optimization problem involving conflicting objective functions. The presence of choices in the model, coupled with the consideration of stationary flow equations, results in an opti mization problem with a mixed-integer nonlinear structure. Following a short introduction of some model-specific details and arising challenges, we present a novel method for computing an enclosure of the nondominated set of such prob lems. We prove finite convergence and demonstrate its effectiveness through numerical experiments. We conclude by offering a preview of ongoing extensions of this work.
Sparsity of matrices can be exploited, in particular, to speed up algorithms in various applications, e.g., in linear system solvers or second-order optimization schemes. This motivates to consider the matrix sparsification problem, in which we seek an equivalent (column-space preserving) representation of a given matrix with as few nonzero entries as possible. We derive sequential solution approaches based on bilinear and linear mixed-integer programs, respectively, and point out connections to certain machine learning tasks. One particular problem appears as a subproblem or special case in both these approaches: Finding a sparsest nonzero vector in the nullspace of a matrix. We will outline how a dedicated branch-and-cut method for this problem can be utilized in the envisioned matrix sparsification algorithms, and touch upon some open questions and challenges of our ongoing work
Information geometry (IG) and Optimal transport (OT) have been attracting much research attention in various fields, in particular machine learning and statistics. In this talk, we present results on the generalization of IG and OT distances for finite-dimensional Gaussian measures to the setting of infinite-dimensional Gaussian measures and Gaussian processes. Our focus is on the Entropic Regularization of the 2-Wasserstein distance and the generalization of the Fisher-Rao distance and related quantities. In both settings, regularization leads to many desirable theoretical properties, including in particular dimension-independent convergence and sample complexity. The mathematical formulation involves the interplay of IG and OT with Gaussian processes and the methodology of reproducing kernel Hilbert spaces (RKHS). All of the presented formulations admit closed form expressions that can be efficiently computed and applied practically. The mathematical formulations will be illustrated with numerical experiments on Gaussian processes.
We introduce a new algorithmic framework, which we call Polyform, for fast approximate matrix multiplication through sums of spherical convolutions (sparse polynomial multiplications). This bilinear operation leads to several new (worst-case and data-dependent) improvements on the speed-vs-accuracy tradeoffs for approximate matrix multiplication. Polyform can also be viewed as a cheap practical alternative to matrix multiplication in deep neural networks (DNNs), which is the main bottleneck in large-scale training and inference.
The algorithm involves unexpected connections to Additive Combinatorics, sparse Fourier transforms, and spherical harmonics. The core of the algorithm is optimizing the polynomial’s coefficients, which is a low-rank SDP problem generalizing Spherical Codes. Meanwhile, our experiments demonstrate that, when using SGD to optimize these coefficients in DNNs, Polyform provides a major (3×-5×) speedup on state-of-art DL models with minor accuracy loss. This suggests replacing matrix multiplication with (variants of) polynomial multiplication in large-scale deep neural networks.
We investigate the following question: how close can two disjoint lattice polytopes be contained in a fixed hypercube? This question occurs in various contexts where this minimal distance appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. Similar bounds are given in the case of disjoint rational polytopes whose binary encoding length is prescribed. Joint work with Shmuel Onn, Sebastian Pokutta, and Lionel Pournin.
We tackle the Optimal Experiment Design Problem, which consists in choosing experiments to run or observations to select from a finite set to estimate the parameters of a system. The objective is to maximize some measure of information gained on the system from the observations, leading to a convex integer optimization problem. We leverage Boscia, a recent algorithmic framework, which is based on a nonlinear branch-and-bound with node relaxations solved to approximate optimality using Frank-Wolfe algorithms. One particular advantage of the method is its efficient utilization of the polytope formed by the original constraints which remains preserved by the method, unlike in those relying on epigraph-based formulations. We assess our method against both generic and specialized convex mixed-integer approaches. Computational results highlight the performance of the proposed method, especially on large and challenging instances.
Computer science, software engineering and artificial intelligence are essential for climate change mitigation, and no hallucinations with GPT (generative pre-trained transformer) needed. With the world needing to halve greenhouse gas emissions by 2030, and reduce these emissions by a further 80% by 2040, the climate science based 1.5°C Paris Agreement trajectory seems to become more and more difficult to achieve. Despite the challenges, there’s hope on the horizon with mass-scale renewable technologies becoming competitive against fossil sources, even in places such as Australia, where coal and natural gas can be produced cheaper than in most other parts of the world. When combined with technologies such as batteries, rooftop solar photovoltaic and electric vehicles, we have a complete set of “solution blocks” for mass decarbonisation. However, while we have many forms of renewable technologies to address the climate situation, integrating all these “solution blocks” together and into the existing grid infrastructure is challenging. This is where artificial intelligence and its full range of forms comes into play.
Random projection techniques based on the Johnson-Lindenstrauss lemma are used for randomly aggregating the constraints or variables of optimization problems while approximately preserving their optimal values, which leads to smaller-scale optimization problems. In this talk, we show the following three applications of random matrix techniques for constructing smaller-scale optimization problems or for constructing random subspace algorithms that iteratively solve smaller-scale subproblems.
- We use this technique for nonconvex quadratic optimization problems (NQOPs) to find approximate global optimal solutions by solving convex optimization problems of smaller dimensions. This takes advantage of the fact that the nonconvexity of an NQOP is mitigated by random projection.
- In order to obtain approximate stationary points to more general nonconvex optimization problems, we propose an algorithm that iteratively solves smaller dimensional subproblems and evaluate the convergence speed of the algorithm.
- We will also report our research results on random subspace algorithms for constrained optimization problems.
This talk is based on joint works with Terunari Fuji, Ryota Nozawa, and Pierre-Louis Poirion.
Instance Space Analysis (ISA) is a recently developed methodology to support objective testing of algorithms. Rather than reporting algorithm performance on average across a chosen set of test problems, as is standard practice, ISA offers a more nuanced understanding of the unique strengths and weaknesses of algorithms across different regions of the instance space that may otherwise be hidden on average. It also facilitates objective assessment of any bias in the chosen test instances, and provides guidance about the adequacy of benchmark test suites and the generation of more diverse and comprehensive test instances to span the instance space. This talk provides an overview of the ISA methodology, and the online software tools that are enabling its worldwide adoption in many disciplines. A case study comparing algorithms for university timetabling is presented to illustrate the methodology and tools, with several other applications to optimisation, machine learning, computer vision and quantum computing highlighted.
We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian as a proxy for robust inference. The previous best algorithms for this model succeed if the planted clique has size at least n2/3 in a graph with n vertices. Our algorithms work for planted-clique sizes approaching n1/2 — the information-theoretic threshold in the semi-random model (Steinhard 2017) and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige and Steinhardt.
Our algorithms rely on a new conceptual connection between planted cliques in the semi-random graph model and certificates of upper bounds on unbalanced biclique numbers in Erdős–Rényi random graphs. We show that higher-degree sum-of-squares polynomials allow us to obtain almost tight certificates of this kind.
Based on a joint work with Rares-Darius Buhai and Pravesh K. Kothari.
We describe constructions of extended formulations that establish a certain relaxed version of the Hirsch-conjecture. Those constructions can be used to show that if there is a pivot rule for the simplex algorithm for which one can bound the number of steps by the diameters of the bases-exchange graphs of the polyhedra of feasible solutions then the general linear programming problem can be solved in strongly polynomial time. The talk is based on joint work with Kirill Kukharenko.
While the Maximum Cut Problem and Unconstrained Binary Quadratic Optimization are of high interest in the scientific community and gain increasing importance, no state-of-the-art solvers for these problems were publicly available before 2022. This changed with the development of our solver McSparse, which is available online: http://mcsparse.uni-bonn.de/. We discuss the relevant building blocks that lead to McSparse and present recent results on improved ilp-models, separation and branching rules for the Maximum Cut problem, as well as further directions for improvement.
After providing a short self-contained introduction on the Monge problem, its potential applications and its computational challenges, I will present in this talk two recent contributions that offer practical solutions. In the first part I will show how changing the so-called ground cost function of optimal transport problems directly influences the structure of such maps in theory, and how this can be turned into a practical tool. In the second part I present a simple approach to estimate Monge maps using a simple regularizer (both works were presented at the ICML’23 conference.
Cutting plane algorithms are a key tool for solving (Mixed) Integer Linear programs. The constraints defining the cutting planes can be divided into the classes of feasibility cuts and optimality cuts. Both classes cut off certain parts of the linear relaxation. However, the optimality cuts, in contrast to the feasibility cuts, can also cut off feasible solutions by using information provided by the objective function. In this work, we even go one step further and present new gap-based optimality (GO) cuts for Binary Linear Programs (BLP) that may cut off optimal integer solutions. Given a feasible solution providing a primal bound U for our given (BLP), our newly developed GO constraints are designed to cut off solutions that are not better than U. We show that they improve a certain class of integer rounding cuts as well as Dantzig cuts and their strengthened version by Charnes and Cooper. Our computational results show that our new constraints can have a positive impact on the solution process.
Over the last decade, many contributions have sought to leverage machine learning algorithms to improve the performances of combinatorial optimization solvers. In particular, due to the ubiquitous nature of graphs in combinatorial tasks, the use of graph neural networks has flourished in the literature. Parallel to these developments, due to their capacity to embed hierarchically structured data such as trees and scale-free graphs with arbitrary low distortion, hyperbolic spaces have gained an increasing amount of attention in the machine learning community. This talk first introduces the necessary conceptual tools in both Riemannian geometry and deep learning to provide a clear understanding of hyperbolic neural networks. Then, it motivates the use of hyperbolic representations in combinatorial applications and focuses on its potential in mixed integer programming.
Recent works on quantum algorithms for solving semidefinite optimization (SDO) problems have leveraged a quantum-mechanical interpretation of positive semidefinite matrices to develop methods that obtain quantum speedups with respect to the dimension n and number of constraints m. While their dependence on other parameters suggests no overall speedup over classical methodologies, some quantum SDO solvers provide speedups in the low-precision regime. We exploit this fact to our advantage, and present an iterative refinement scheme for the Hamiltonian Updates algorithm of Brandão et al. [Faster quantum and classical SDP approximations for quadratic binary optimization, Quantum 6, 625 (2022)] to exponentially improve the dependence of their algorithm on the precision 𝜖, defined as the absolute gap between primal and dual solution. As a result, we obtain a classical algorithm to solve the semidefinite relaxation of Quadratic Unconstrained Binary Optimization problems (QUBOs) in matrix multiplication time. Provided access to a quantum read/classical write random access memory (QRAM), a quantum implementation of our algorithm exhibits O(ns + n1.5 polylog(n, ‖C‖F, 1/𝜖)) running time, where C is the cost matrix, ‖C‖F is its Frobenius norm, and s is its sparsity parameter (maximum number of nonzero elements per row).
Clustering using k-means is a classic problem with significant practical implications. The elegant k-means++ algorithm, proposed by Arthur and Vassilvitskii [k-means++: the advantages of careful seeding SODA 2007,1027–1035], is one of the most popular approaches for solving it and is a O(log k)-approximation. However, some limitations of this algorithm have been identified, specifically in cases where the data is highly clusterable. Recently, Balcan et al. [Data-Driven Clustering via Parameterized Lloyd’s Families NeurIPS 2018, 31, 10641–10651] explored a new data-driven approach to overcome these limitations. They proposed to learn a parameter 𝛼 in order to parameterize the seeding as follows: a point is selected as a cluster center with probability proportional to the 𝛼-powered distance from the point to its closest center selected thus far. The standard k-means++ is then the particular case of 𝛼=2. While it is known that selecting 𝛼≠2 may lead to a worse guarantee in the worst case, they qualitatively and experimentally show the advantage of this parameterized seeding in several settings. In this talk, I will describe the analysis of (standard) k-means++ and how this leads to a general analysis of parameterized seeding in k-means++, through a carefully constructed potential function. Using this potential, I will describe how the approximation guarantee (for any 𝛼 > 2) depends on parameters that captures the concentration of points of any optimal cluster, the ratio of the maximum number of points to the minimum number of points in the optimal clusters, and the ratio of the maximum and the minimum variance of the set of optimal clusters. As a corollary, we obtain that seeding with 𝛼 > 2 results in a constant factor approximation guarantee on a wide class of instances. Finally, I will provide examples that necessitates the dependence on some of the aforementioned parameters. This is joint work with Etienne Bamas (EPFL->ETHZ) and Ola Svensson (EPFL).
I will discuss recent progress on the problem of sampling from a nonlinear real smooth algebraic variety; that is a nonlinear smooth manifold defined as the zero set of polynomial equations.
Koopman operator theory has been successfully applied to problems from various research areas such as fluid dynamics, molecular dynamics, climate science, engineering, and biology. Most applications of Koopman theory have been concerned with classical dynamical systems driven by ordinary or stochastic differential equations. In this presentation, we will first compare the ground-state transformation and Nelson’s stochastic mechanics, thereby demonstrating that data-driven methods developed for the approximation of the Koopman operator can be used to analyze quantum physics problems. Moreover, we exploit the relationship between Schrödinger operators and stochastic control problems to show that modern data-driven methods for stochastic control can be used to solve the stationary or imaginary-time Schrödinger equation. Our findings open up a new avenue towards solving Schrödinger’s equation using recently developed tools from data science.
We revisit the (block-angular) min-max resource sharing problem, which is a well-known generalization of fractional packing and the maximum concurrent flow problem. It consists of finding an ℓ∞-minimal element in a Minkowski sum X=∑c∈CXc of non-empty closed convex sets Xc⊆ℝR≥0, where C and R are finite sets. We assume that an oracle for approximate linear minimization over Xc is given.
We improve on the currently fastest known FPTAS in various ways. A major novelty of our analysis is the concept of local weak duality, which illustrates that the algorithm optimizes (close to) independent parts of the instance separately. Interestingly, this implies that the computed solution is not only approximately ℓ∞-minimal, but among such solutions, also its second-highest entry is approximately minimal.
Based on a result by Klein and Young, we provide a lower bound of 𝛺(((|C|+|R|) log |R|)/𝛿²) required oracle calls for a natural class of algorithms. Our FPTAS is optimal within this class — its running time matches the lower bound precisely, and thus improves on the previously best-known running time for the primal as well as the dual problem.
The vanishing ideal of points is the set of all polynomials that vanish over the points. Any vanishing ideal can be generated by a finite set of vanishing polynomials or generators, and the computation of approximate generators has been developed at the intersection of computer algebra and machine learning in the last decade under the name of approximate computation of vanishing ideals. In computer algebra, the developed algorithms are supported by theories more deeply, whereas, in machine learning, the algorithms have been developed toward applications at a cost of some theoretical properties. In this talk, I will present a review on the development of approximate computation of vanishing ideals in two fields, particularly from the perspective of the spurious vanishing problem and normalization, which are recently suggested as a new direction of development.
A key problem in mathematical imaging, signal processing and computational statistics is the minimization of non-convex objective functions over conic domains, which are continuous but potentially non-smooth at the boundary of the feasible set. For such problems, we propose a new family of first and second-order interior-point methods for non-convex and non-smooth conic constrained optimization problems, combining the Hessian barrier method with quadratic and cubic regularization techniques. Our approach is based on a potential-reduction mechanism and attains a suitably defined class of approximate first- or second-order KKT points with worst-case iteration complexity O(ϵ−2) and O(ϵ−3/2), respectively. Based on these findings, we develop a new double loop path-following scheme attaining the same complexity, modulo adjusting constants. These complexity bounds are known to be optimal in the unconstrained case, and our work shows that they are upper bounds in the case with complicated constraints as well. A key feature of our methodology is the use of self-concordant barriers to construct strictly feasible iterates via a disciplined decomposition approach and without sacrificing on the iteration complexity of the method. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained optimization problems. This is joint work with Pavel Dvurechensky (WIAS Berlin) and based on the paper: arXiv:2111.00100 [math.OC].
Linear optimization, also known as linear programming, is a modelling framework widely used by analytics practitioners. The reason is that many optimization problems can easily be described in this framework. Moreover, huge linear optimization problems can be solved using readily available software and computers. However, a linear model is not always a good way to describe an optimization problem since the problem may contain nonlinearities. Nevertheless such nonlinearities are often ignored or linearized because a nonlinear model is considered cumbersome. Also there are issues with local versus global optima and in general it is just much harder to work with nonlinear functions than linear functions.
Over the last 15 years a new paradigm for formulating certain nonlinear optimization problems called conic optimization has appeared. The advantage of conic optimization is that it allows the formulation of a wide variety of nonlinearities while almost keeping the simplicity and efficiency of linear optimization.
Therefore, in this presentation we will discuss what conic optimization is and why it is relevant to analytics practitioners. In particular we will discuss what can be formulated using conic optimization, illustrated by examples. We will also provide some computational results documenting that large conic optimization problems can be solved efficiently in practice. To summarize, this presentation should be interesting for everyone interested in an important recent development in nonlinear optimization.
Proximity operators are tools which use first-order information to solve optimization problems. However, unlike gradient-based methods, algorithms involving proximity operators are guaranteed to work in nonsmooth settings. This expository talk will discuss the mathematical and numerical properties of proximity operators, how to compute them, algorithms involving them, and advice on implementation.
Bayesian optimization (BO) has demonstrated impressive success in optimizing black-box functions. However, there are still challenges in dealing with black-boxes that include both continuous and categorical inputs. I am going to present our recent works in optimizing the mixed space of categorical and continuous variables using Bayesian optimization [B. Ru, A. Alvi, V. Nguyen, M. Osborne, and S. Roberts. “Bayesian optimisation over multiple continuous and categorical inputs.” ICML 2020] and how to scale it up to higher dimensions [X. Wan, V. Nguyen, H. Ha, B. Ru, C. Lu, and M. Osborne. “Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces.” ICML 2021] and population-based AutoRL setting [J. Parker-Holder, V. Nguyen, S. Desai, and S. Roberts. “Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL”. NeurIPS 2021].
This talk describes the solution of convex optimization problems that include uncertainty modeled by a finite but potentially very large multi-stage scenario tree.
In 1991, Rockafellar and Wets proposed the progressive hedging (PH) algorithm to solve such problems. This method has some advantages over other standard methods such as Benders decomposition, especially for problems with large numbers of decision stages. The talk will open by showing that PH is an application of the Alternating Direction Method of Multipliers (ADMM). The equivalence of PH to the ADMM has long been known but not explicitly published.
The ADMM is an example of an “operator splitting” method, and in particular of a principle called “Douglas–Rachford splitting”. I will briefly explain what is meant by an “operator splitting method”.
Next, the talk will apply a different, more recent operator splitting method called “projective splitting” to the same problem. The resulting method is called “asynchronous projective hedging” (APH). Unlike most decomposition methods, it does not need to solve every subproblem at every iteration; instead, each iteration may solve just a single subproblem or a small subset of the available subproblems.
Finally, the talk will describe work integrating the APH algorithm into mpi-sppy
, a Python package for modeling and distributed parallel solution of stochastic programming problems. mpi-sppy
uses the Pyomo Python-based optimization modeling sytem. Our experience includes using up to 2,400 processor cores to solve 2-stage and 4-stage test problem instances with as many as 1,000,000 scenarios.
Portions of the work described in this talk are joint with Patrick Combettes (North Carolina State University), Jean-Paul Watson (Lawrence Livermore National Laboratory, USA), and David Woodruff (University of California, Davis).
We consider mixtures of k≥2 Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most k-C for a large enough constant C≥1.
Previous statistical-query lower bounds [Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart, Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures (extended abstract), 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, pp. 73–84] give formal evidence that, even for the special case of colinear means, distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in k).
We show that, surprisingly, this kind of hardness can only appear if mixing weights are allowed to be exponentially small. For polynomially lower bounded mixing weights, we show how to achieve non-trivial statistical guarantees in quasi-polynomial time.
Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of k≥2 well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates some pairs of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component.
For the special case of colinear means, our algorithm outputs a k-clustering of the input sample that is approximately consistent with all components of the underlying mixture. We obtain similar clustering guarantees also for the case that the overlap between any two mixture components is lower bounded quasi-polynomially in k (in addition to being upper bounded polynomially in k).
A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous algorithmic results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights.
A key technical ingredient of our algorithms is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.
Optimization models for engineering design and operation, are frequently described by complex models of black-box simulations. The integration, solution, and optimization of this ensemble of large-scale models is often difficult and computationally expensive. As a result, model reduction in the form of simplified or data-driven surrogate models is widely applied in optimization studies. While the application to machine learning and AI approaches has lead to widespread optimization studies with surrogate models, less attention has been paid to validating these results on the optimality of high-fidelity, i.e., ‘truth’ models. This talk describes a surrogate-based optimization approach based on a trust-region filter (TRF) strategy. The TRF method substitutes surrogates for high-fidelity models, thus leading to simpler optimization subproblems with sampling information from truth models. Adaptation of the subproblems is guided by a trust region method, which is globally convergent to the local optimum of the original high-fidelity problem. The approach is suitable for broad choices of surrogate models, ranging from neural networks to physics-based shortcut models. The TRF approach has been implemented on numerous optimization examples in process and energy systems, with complex high fidelity models. Three case studies will be presented for Real-Time Optimization (RTO) for oil refineries, chemical processes and dynamic adsorption models for CO2 capture, which demonstrate the effectiveness of this approach.
In this talk, we consider a robust optimization problem with continuous decision-dependent uncertainty (RO-CDDU), which has two new features: an uncertainty set linearly dependent on continuous decision variables and a convex piecewise-linear objective function. We prove that RO-CDDU is NP-hard in general and reformulate it into an equivalent mixed-integer nonlinear program (MINLP) with a decomposable structure to address the computational challenges. Such an MINLP model can be further transformed into a mixed-integer linear program (MILP) given the uncertainty set’s extreme points. We propose an alternating direction algorithm and a column generation algorithm for RO-CDDU. We model a robust demand response (DR) management problem in electricity markets as RO-CDDU, where electricity demand reduction from users is uncertain and depends on the DR planning decision. Extensive computational results demonstrate the promising performance of the proposed algorithms in both speed and solution quality. The results also shed light on how different magnitudes of decision-dependent uncertainty affect the demand response decision.
In quantum mechanics, performing a measurement is an invasive process which generally disturbs the system. Due to this phenomenon, there exist incompatible quantum measurements, i.e., measurements that cannot be simultaneously performed on a single copy of the system.
In this talk we will explain the robustness-based approach generally used to quantify this incompatibility and how it can be cast, for finite-dimensional systems, as a semidefinite programming problem. With this formulation at hand we analytically investigate the incompatibility properties of some high-dimensional measurements and we tackle, for an arbitrary fixed dimension, the question of the most incompatible pairs of quantum measurements, showing in particular optimality of Fourier-conjugated bases.
I will consider the problem of minimizing functions of discrete variables represented as a sum of “tractable” subproblems. First, I will briefly review recent theoretical results characterizing complexity classification of discrete optimization in the framework of “Valued Constraint Satisfaction Problems” (VCSPs). Then I will talk about algorithms for solving Lagrangian relaxations of such problems. I will describe an approach based on the Frank-Wolfe algorithm that achieves the best-known convergence rate. I will also talk about practical implementation, and in particular about in-face Frank-Wolfe directions for certain combinatorial subproblems. Implementing such directions for perfect matching subproblems boils down to computing a Gomory-Hu (GH) tree of a given graph. Time permitting, I will describe a new approach for computing GH tree that appears to lead to a state-of-the-art implementation.
There has been enormous progress in the branch-and-bound methods in the past couple of decades. In particular, much effort has been put into the so-called variable selection problem, i.e. the problem of choosing which variable to branch on in the current search node. Recently, many researchers have investigated the potential of using machine learning to find good solutions to this problem by for instance trying to mimic what good, but computationally costly, heuristics do. The main part of this research has been focused on branching on so-called elementary disjunctions, that is, branching on a single variable. Theory, such as the results by H.W. Lenstra, Jr. and by Lovász & Scarf, tells us that we in general need to consider branching on general disjunctions, but due in part to the computational challenges to implement such methods, much less work in this direction has been done. Some heuristic results in this direction have been presented.
In this talk we discuss both theoretical and heuristic results when it comes to branching on general disjunctions with an emphasis on lattice based methods. A modest computational study is also presented. In the last part of the talk we also give a short description of results from applying machine learning to the variable selection problem. The talk is based on joint work with Laurence Wolsey.
Over the last three decades, the online bipartite matching (OBM) problem has emerged as a central problem in the area of Online Algorithms. Perhaps even more important is its role in the area of Matching-Based Market Design. The resurgence of this area, with the revolutions of the Internet and mobile computing, has opened up novel, path-breaking applications, and OBM has emerged as its paradigmatic algorithmic problem. In a 1990 joint paper with Richard Karp and Umesh Vazirani, we gave an optimal algorithm, called RANKING, for OBM, achieving a competitive ratio of (1 – 1/e); however, its analysis was difficult to comprehend. Over the years, several researchers simplified the analysis.
We will start by presenting a “textbook quality” proof of RANKING. Its simplicity raises the possibility of extending RANKING all the way to a generalization of OBM called the adwords problem. This problem is both notoriously difficult and very significant, the latter because of its role in the AdWords marketplace of Google. We will show how far this endeavor has gone and what remains. We will also provide a broad overview of the area of Matching-Based Market Design and pinpoint the role of OBM.
We survey the state of the art in VLSI routing. Interconnecting millions of sets of pins by wires is challenging because of the huge instance sizes and limited resources. We present our general approach as well as algorithms for fractional min-max resource sharing and goal-oriented shortest path search sped up by geometric distance queries. These are key components of BonnRoute, which is used for routing some of the most complex microprocessors in industry.
Discrete events are sequential observations that record event time, location, and possibly “marks” with additional event information. Such event data is ubiquitous in modern applications, including social networks, neuronal spike trains, police reports, medical ICU data, power networks, seismic activities, and COVID-19 data. We are particularly interested in capturing the complex dependence of the discrete events data, such as the latent influence — triggering or inhibiting effects of the historical events on future events — a temporal causal relationship. I will present my recent research on this topic from estimation, uncertainty quantification, handling high-dimensional marks, and leveraging neural network representation power. The developed methods particularly consider computational efficiency and statistical guarantees, leveraging the recent advances in variational inequality for monotone operators that bypass the difficulty posed by the original non-convex model estimation problem. The performance of the proposed method is illustrated using real-world data: crime, power outage, hospital ICU, and COVID-19 data.
RIKEN is one of Japan’s largest fundamental-research institutes. The RIKEN Center for Advanced Intelligence Project (AIP) was created in 2016 to propose and investigate new machine learning methodologies and algorithms, and apply them to societal problems. AIP covers a wide range of topics from generic AI research (machine learning, optimization, applied math., etc.), goal-oriented AI research (material, disaster, cancer, etc.), and AI-in-society research (ethics, data circulation, laws, etc.). In the first half of my talk, I will briefly introduce our center’s activities and collaboration strategies.
Then, in the latter half, I will talk about the research activities in my team, i.e., machine learning from imperfect information. Machine learning has been successfully used to solve various real-world problems. However, we are still facing many technical challenges, for example, we want to train machine learning systems without big labeled data and we want to reliably deploy machine learning systems under noisy environments. I will overview our recent advances in tackling these problems.
Deep Learning has received much attention lately, however, results regarding the computational complexity of training deep neural networks have only recently been obtained. Moreover, all current training algorithms for DNNs with optimality guarantees possess a poor dependency on the sample size, which is typically the largest parameter. In this work, we show that the training problems of large classes of deep neural networks with various architectures admit a polyhedral representation whose size is linear in the sample size. This provides the first theoretical training algorithm with provable worst-case optimality guarantees whose running time is polynomial in the size of the sample.