Sven Hollowell

Working PhD Project Title:
Voice-control of Personal-Computers using Large Language Models
Academic Background:
MSc Data Science, University of Bristol (2015-16) BSc Physics, University of Bath

General Profile:

I have a MSc in Machine Learning, Data Mining and High Performance Computing from the University of Bristol, and a BSc in Physics from the University of Bath.

I worked as a Software Engineer working on FoodDB, a research database of food products around the world, scraped weekly from online shopping websites. I was also a Lead Engineer of BetterBasket, a browser-extension that augments online shopping with an interactive data-layer, providing better nutritional and environmental information about foods as you browse.

Previously, I worked on large scale analysis of health datasets, designing machine-learning algorithms for classifying wearable motion-sensor data collected by the UK BioBank study - objectively measuring physical activity in over 100,000 individuals.

Research Project Summary:

In my project I have designed a prototype Intelligent Personal Assistant (IPA) capable of using traditional desktop applications on behalf of a user. The design was motivated by recent rapid gains in the abilities of large-language-models (LLMs – such as ChatGPT) which are capable of multi-step reasoning, generalisation to new problem domains, and tool use, all of which are combined to build a system that can execute natural language tasks using pre-existing applications on the users computer.

While there has been some recent research into giving LLMs control of user applications, this has been limited to mobile applications, and has not been demonstrated in a functional prototype. In my work I extend the application of LLMs into desktop-applications, which have much more complex user interfaces due to their larger screen size. I also demonstrate how LLM-generated actions such as mouse clicks can be simulated to create a fully voice-operated automation system, and have designed a prototype that can reliably complete single-task commands such as “make the title bigger” or “move that file to the Desktop”.

One focus of my project is into human-in-the-loop feedback. LLMs are able to understand natural language instructions to improve their abilities, which is especially useful when performing previously unseen tasks. To this end I have developed a UI-demonstration tool that allows users to teach new UI actions to the LLM.

The next stages of the project are to study how such systems can be extended into performing increasingly complex action sequences through a combination of human teaching, and machine exploration of the environment (reinforcement learning).

The contributions of this project are as follows:

- Designing a functional prototype to study Human-AI collaboration using desktop computer applications. Can humans effectively collaborate with and teach LLMs to perform novel actions?

- Investigation into techniques that allow LLM systems to follow single-step tasks using prompting techniques and representing the screen in a textual format that the LLM can understand.

Future planned contributions:

- Gather a dataset of user demonstrations that can be used to improve the model and test whether the LLM is capable of learning new tasks.

- Investigation into techniques that allow LLM systems to follow complex multi-step tasks by incorporating reinforcement learning, model fine-tuning, and other deep-learning techniques

- Investigation into multi-modal inputs, meaning that instead of representing the screen in text format, the model will look at an image of the screen directly to understand it’s environment.

The use of LLMs for autonomous task execution is a novel field of research that is incredibly promising. The ability of LLMs to understand and reason about most tech-support type questions suggests that they are capable of reasoning about how to perform tasks on a computer, the missing link is providing them with the ability to act on their knowledge.

Supervisors:

Dr Paul Marshall, School of Computer Science
Prof. Raul Santos-Rodriguez, School of Engineering Mathematics and Technology

Website:

University Research Page