Congratulations Nick Baskerville

Nick Baskerville news pic (PDF, 1,151kB)

The arrangement of local optima on the loss surfaces of generative adversarial networks.

6 November 2023

Nick Baskerville has been awarded the 2023 Doctoral Prize for Mathematical / Environmental Sciences

Nick says 'The motivation for my research was to advance the theoretical understanding of neural networks, a now ubiquitous type of machine learning model. They are particularly prescient at the moment, since ChatGPT brought them to popular attention, but almost any technology labelled as AI, and many that aren’t (e.g. ANPR in car parks), is likely to be using some form of neutral network at its core. The fantastic advances of the last decade or so belong squarely to the field of engineering. The basic mathematical foundations of neural networks are not complicated and a great deal has been achieved through experimentation and heuristics powered by an explosion of data and the development of powerful open source software packages. Despite all of this, neural networks are far from being thoroughly understood theoretically. The procedure used to train something like ChatGPT should be intelligible to most people, and most undergraduates in the mathematical sciences should be able access the full mathematical description, but fundamentally the explanation of why this procedure succeeds at all, let alone produces such a powerful technology, is largely a mystery. Throughout the renaissance of neural networks in recent years, ideas based on rigorous classical theory about what should work in the realm of machine learning have been repeatedly subverted. There are various strands of work that attempt to explain aspects of the unexpected success of neural networks by developing new theoretical approaches and that is where my research sits. One approach is to consider the loss surfaces of neural networks. Modern neural networks have billions of parameters and these are essentially magic numbers that we tune until we get the results we want (e.g. a working face recognition model). Mathematically, this process can be viewed as moving around on a very high dimensional surface searching for the deepest valleys. This is the origin of one of mysteries of neural networks, namely that searching for the deepest valley among billions of others in a billion dimensions should be a hopeless task, but in practice it seems to “just work”. My research focusses on using tools and ideas from random matrix theory to obtain insights into the statistical and geometric structure of loss surfaces. For certain models of neural networks, we were able to show that random matrix effects result in a surprisingly favourable loss surface on which there are exponentially many local optima, but that an optimisation algorithm is likely to able to evade the poor ones before getting stuck in a good one. Several aspects of the calculations were novel and quite unlike prior approaches. The second main part of my work experimentally uncovered random matrix statistics on the local scale in neural networks and used powerful universality results to make quite general statements about certain practically relevant statistics of loss surfaces. We believe this is the first time that local random matrix statistics have been used in machine learning theory and the new perspective is likely to have further potential'.

Congratulations on this fantastic achievement Nick!