Probabilistic Automated Numerical Analysis in Machine learning and Artificial intelligence

From the outside, AI or machine learning is the process of fitting a model to some data -- for example, fitting a generative model for language so that the text it produces is similar to training data found in documents across the internet. But what actually happens inside of the learning machine, the process of learning itself, is a numerical computation: The solution of a mathematical problem that has no analytic solution. In particular, for contemporary machine learning: optimization, to find "best fits"; simulation, to "predict what might happen next", and linear algebra, the solution of extremely large systems of linear equations, with millions or billions of unknown variables. Although machine learning is a young discipline, the algorithms used for these purposes are surprisingly old. They were invented in other disciplines to address structurally similar, but not completely identical tasks. This allowed AI and ML to move quickly without having to reinvent the wheel. But it also creates subtle problems, a mathematical version of the legacy-code problem known in IT: Because machine learning is not exactly like older tasks in applied mathematics, the old algorithms do not always work well; they require costly work-arounds to work in some settings; and they sometimes become so unstable that human users must be present to monitor them. This creates significant inefficiency in modern ML, both in terms of human, and technological resources.

The goal of Project PANAMA, in a nutshell, was to refurbish classic numerical methods to make them work natively, efficiently and effectively in contemporary machine learning. This was achieved by leveraging a deep conceptual insight, namely that mathematical computation itself is a form of learning, of inferring a latent (mathematical) quantity from observable (computable) numbers. Where machine learning uses empirical data, collected in the world and stored on disk, a numerical method uses computational data, collected by a chip and stored in RAM. Both processes can be phrased completely equivalently in the language of probabilistic inference. The resulting probabilistic numerical methods can then be embedded seamlessly within the wider machine learning pipeline.

This is useful because both data and compute are finite resources. A machine learning model that runs on a small, weakly informative data set does not require high-precision internal computations. So if the numerical algorithm "knows" about this, it can safe computational resources. But data and compute are also both sources of information. By treating them equally, they can be mixed flexibly: For example in scientific machine learning, computational information in the form of simulations, can be used to supplement missing empirical data, and empirical measurements can be used to identify unknown parameters in dynamical simulations.

The project was organized along numerical tasks, addressing optimization, simulation and linear algebra as initially separate problems, and then combining them into a joint algorithmic framework.

In optimization, the project team focused on deep learning as the most prominent model type in contemporary ML. In its common setup, deep learning is a stochastic optimization problem: Large datasets must be split into batches to fit into computers' working memory, a process that produces a drastic drop in the precision of the overall computation. Project PANAMA provided new software tools to quantify, track and control the resulting computational uncertainty, and contributed to a new formalism that automatically turns nearly every deep network into a probabilistic model.

Simulation is arguably the area in which PANAMA provided the most fundamental advancement. It created a formalism to phrase simulation (the solution of differential equations through time) explicitly in the language of signal processing and time-series inference, in the algorithmic language of Bayesian filtering and smoothing. Providing theoretical proof that this formulation is as powerful as earlier, more rigid formulations, motivated the development of a new algorithmic stack for data-centric simulation.

In linear algebra, the PANAMA team took a close look at the foundational machine learning algorithms that use linear algebra most directly -- Gaussian process and kernel ridge regression methods. In fact, these models are only a very thin shell around a family of classic numerical algorithms for least-squares problems. These classic methods were re-phrased and augmented so that they provide exactly the quantities needed in the ML setting, and their iterative nature was re-phrased in terms of the data loading process of an ML algorithm. The separation between processing data and training the ML model is thus completely removed.

PANAMA established the novel paradigm of probabilistic numerical computation for practitioners in machine learning. It lead to the publication of a textbook, and several extensive open-source software libraries. The theoretical and practical insights developed within the project have implications for large parts of machine learning as a domain. They have delivered, or are opening the possibility for, significant efficiency gains in the core computations of AI and machine learning; as well as novel and flexible functionality for machine learning engineers.

Periodic Reporting for period 4 - PANAMA (Probabilistic Automated Numerical Analysis in Machine learning and Artificial intelligence)

Partager cette page

Télécharger