Supervised learning to support the optimisation of chemical reactions

This project explored the application of different statistical approaches to the analysis of chemical data from previous systematic experiments on a key variable (ligands) in homogeneous catalysis.​

The need for new data analysis approaches

Chemical synthesis ultimately aims to provide the right compound for a desirable application, be that a pharmaceutical to cure disease, a better battery, or a complex to sequester carbon dioxide. Progress has traditionally been made through trial-and-error, guided and informed by previous experience, but this is a laborious and expensive (in resource and researcher time) process. Robotic screening and the automation of routine tasks in chemical synthesis are helping to address this, but new data analysis approaches are needed to optimally use such experimental results. This could move chemical synthesis towards a more profound understanding of the sensitivity of experiments to changes in, and interactions between, input variables, thus facilitating a focus on the most promising experiments.

Statistical approaches to the analysis of chemical data

This project explored the application of different statistical approaches to the analysis of chemical data from previous systematic experiments on a key variable (ligands) in homogeneous catalysis. While our data on the performance of catalysts for an interesting chemical reaction has been analysed with standard regression approaches previously, we were interested in comparing these models with a range of new statistical modelling approaches of varying complexity (ridge and LASSO regression). While the resulting models should be amenable to chemical interpretation, our ultimate goal is the prediction of likely experimental results to guide robotic screening to the most promising reactions, and models were evaluated (bootstrapping) with this in mind.

We found that chemical data and prediction in this field poses considerable challenges to statistical approaches and engaged with academic (York) and industrial collaborators (CatSci, Cardiff-based SME) to clarify these and so facilitate follow-on funding applications.

What's next 

Following this exploratory project, plans for the future include:

  • Integration of data analysis results in manuscript for publication

  • Preparation of review/perspective, together with CatSci (industrial collaborators), to highlight challenges for statistical data analysis in chemistry

  • Development of funding applications in this area (collaborations with stakeholders, related to automated synthesis/data and fellowship opportunity for Co Investigator)

  • Dissemination of results through presentations and seminars.

 

People involved in this project

  • Dr Natalie Fey (Lecturer in Chemistry)
  • Dr Ben Swallow (Research Associate)
Edit this page