Speaker profiles

Tara Javidi (University of California, San Diego)

Title: Adaptive Sampling for Distribution Estimation and Fair Classification

Abstract: We consider the problem of allocating a fixed budget of samples to a finite set of discrete distributions to learn them uniformly well (minimizing the maximum error) in terms of a few common distance measures. To present a unified treatment of these distances, we first propose a general optimistic tracking algorithm and analyze its sample allocation performance with respect to an oracle. We then instantiate this algorithm for the differet distance measures and derive bounds on their regret. We also show that the allocation performance of the proposed algorithm cannot, in general, be improved, by deriving lower-bounds on the expected deviation from the oracle allocation for any adaptive scheme. We verify our theoretical findings through some experiments.Time permitting, we will discuss how the above principles can be extended to address the problem of adaptively constructing training sets which allow us to learn classifiers that are fair in a minimax sense. We first propose an adaptive sampling algorithm based on the principle of optimism and then validate the benefits of adaptively constructing training sets via experiments on synthetic tasks with logistic regression classifiers, as well as on several real-world tasks using convolutional neural networks (CNNs).

Varun Kanade (University of Oxford)

Title: Decentralized Cooperative Stochastic Bandits

Abstract: We will discuss a decentralized setting of the cooperative stochastic multi-armed bandit problem with K arms on a network of N agents. The reward distribution of the arms is identical across the agents, though the rewards are drawn independently across agents and time steps. In each round, each agent chooses an arm to play and can subsequently send messages to her neighbours. The overall goal is to minimize the total regret of the entire network. We will present a fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average rewards obtained by all the agents for each arm, and then uses a UCB-like algorithm. We analyze the regret of this algorithm and also show lower bounds. Our regret is bounded by the optimal centralized regret plus a natural term that depends on the spectral gap of the communication matrix of the network. Our algorithm is simpler to analyze and achieves better regret bounds than most existing algorithms while also requiring less information about the underlying network.

(based on joint work with David Martinez-Rubio and Patrick Rebeschini)

Daniel Kious (University of Bath)

Title: Finding geodesics on graphs using reinforcement learning

Abstract: The premise of our talk will be the fact that ants are believed to find shortest paths between their nest and sources of food by successive random explorations of their environment, without any means of communication other than the pheromones they leave behind them. We will discuss a work in collaboration with Bruno Schapira and Cécile Mailler in which we introduce a general probabilistic model for this phenomenon, based on reinforcement-learning. We will present various variants of the model, with slightly different reinforcement mechanisms, and show that these small differences in the rules yield significantly different largetime behaviors. In the version called the loop-erased ant process, we are able to prove that the ants manage to find the shortest paths on all series-parallel graphs.

Tor Lattimore (DeepMind)

Title: Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

Abstract: The information ratio developed by Russo and Van Roy (2014) is a powerful tool that was recently used to derive upper bounds on the regret for challenging sequential decision-making problems. I will talk about how a generalised version of this machinery is related to the standard algorithms from convex optimisation and how it can be used to derive lower bounds. The main novel application is to show that a version of mirror descent is minimax optimal for partial monitoring using Rustichini's definition of regre.

Omar Rivasplata (UCL)

Title: Tighter Risk Certificates for Neural Networks

Abstract: We present an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data.

Sanjay Shakkottai (University of Texas at Austin)

Title: Representation Learning with Model-Agnostic Meta-Learning

Abstract: Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks. However, the mechanics of GBML have remained largely mysterious from a theoretical perspective. In this talk, we show that GBML methods such as Model-Agnostic Meta-Learning (MAML) and its variants are capable of learning a common representation among a set of given tasks in the well-known multi-task linear representation learning setting. Moreover, our analysis illuminates that the driving force causing MAML to recover the underlying representation is that they adapt the final layer of their model, which harnesses the underlying task diversity to improve the representation in all directions of interest. To the best of our knowledge, these are the first results to show that MAML and its variants learn expressive representations and to rigorously explain why they do so. Based on joint work with Liam Collins, Aryan Mokhtari and Sewoong Oh.

Sam Tickle (University of Bristol)

Title: A new online, multivariate, nonparametric changepoint detection method

Abstract: Changepoint detection has received considerable attention over the past few years, with popular new methods arising to relax many of the model-based assumptions surrounding classical approaches, or else to be functional in the multivariate setting, or else to be able to detect changes in the sequential setting without raising too many false alarms. We'll discuss a new method which aims to simultaneously satisfy all three of these requirements, covering some of its theoretical properties and demonstrating its use on various multivariate time series.

Sofia Villar (University of Cambridge)

Title: Sequential learning in clinical trials: what does the learning-earning dilemma actually mean in practice?

Abstract: The multi-armed bandit problem (MABP) is an idealized mathematical sequential learning framework for deciding how to optimally allocate a resource among a number of competing uses, given that such allocation is to be done sequentially and under randomly evolving conditions. Although their scope is much more general, one of the most common applications chosen to motivate this methodology is that of a clinical trial which has the aim of balancing two separate goals: (1) correctly identifying the best treatment (exploration or learning) and (2) treating patients effectively during the trial (exploitation or earning). These two goals are naturally in conflict. Correctly identifying the best treatment at the end of the trial (i.e. at a target treatment effect margin and with a high probability) requires a large number of patients to be assigned to all treatments to be able to compare them, and therefore goal (1) acts to limit goal (2) during the trial. In this talk I will briefly introduce a multi-armed bandit problem of interest in the context of a clinical trial. Then, I will describe some of the statistical and practical challenges that the use of bandit-based approaches to design clinical trials pose and illustrate some of the solutions that have been proposed so far to make their implementation more appropriate in practice.

Steffan Grunewalder (University of Lancaster)

Title: Restless bandit problems

Abstract: I will present various results for restless bandits and related settings. In particular, I will consider restless bandits that have close to independent pay-offs (small phi-mixing coefficients), and restless bandits with highly dependent pay-offs. I will also highlight the challenges that arise when the pay-offs are neither nearly independent nor strongly dependent in the context of linear bandits. Finally, I will discuss some links between dependency measures like alpha mixing and geometric properties of reproducing kernel Hilbert spaces that can be of use for infinite dimensional linear bandit problems.