The Bristol RSE team support a range of projects across the university, many of which can be found in our GitHub organisation. Here are some example case study projects that help demonstrate some of the work that we can do.
Machine-learning-augmented block play
Michael Rumbelow is a PhD student in the School of Education, researching early concepts of numbers in children through the medium of block play. With seedcorn funding from the Bristol Digital Futures Institute, Michael, his PhD supervisor Alf Coles, and computer vision consultant PySource, collaborated to create a prototype application that augments and analyses block play in real-time. The Python application uses a machine learning model to recognise blocks from a webcam's video feed, responding to block arrangements in real-time with interactive educational elements. In one mode of operation, for example, blocks represent mathematical objects and arrangements lead to audio output being generated, e.g. "two plus three equals five".
The researchers contacted the RSE team's ask-rse mailbox to request assistance with (i) training the machine learning model used in the prototype application and (ii) installing the application on a new machine. A member of the RSE team, James Womack, assisted the researchers through a series of video consultations and independent investigative work.
The technical assistance provided by the RSE team enabled the researchers to make more effective use of the prototype application, giving them the capability to train the machine learning model on new data and to run the application on multiple machines.
MetaWards is a SIR-based metapopulation disease modelling program used to model how epidemics can spread through population. The Bristol RSE team were drafted in to help modernise this code so that it could be applied at scale to model the Covid-19 pandemic. MetaWards was originally developed by Leon Danon as a tool for metapopulation modelling. This was originally written in C. The Bristol RSE team were brought on board to modernise the code, add support for new features, and improve the robustness and trustability of the code as it was applied to modelling the Covid-19 pandemic. We ported the code to Python as this significantly improved its readability and extensibility. We used Cython to speed the code up, achieving a single-core performance of the new Python code that was faster than the original C code. We then parellelised the code using OpenMP, and then further parallelised using mpi4py to support running massively parallel jobs on a cluster. This significantly improved the runtime for single jobs (from minutes to seconds), and made it easy for researchers to run massive parameter sweeps using thousands of CPU cores. We added a complete unit test suite, complete documentation, and a complete set of tutorials. We used the modernised design to create new features that supported completely new ways of modelling, based on multi-network demographics. We created an easy inteface for designing new metapopulation networks, new network models and new disease descriptors. The software is now used in production, and has formed the basis of successful follow-on funding applications. The work was completed between March and August 2020.
minimalmarkers is a code for choosing the minimum set of genetic markers needed to differentiate all samples in a genotyping dataset. For example, the software was used to select markers to ID SARS-Cov2 lineages. The RSE Group worked with Dr Gary Barker, a Senior Lecturer in the School of Biological Sciences, to speed up minimalmarkers. After 10 days of work, the software was accelerated by over 25,000 times. This reduced the runtime for identifying the minimal set of markers to distinguish differentvarieties of wheat from about 10 days to just 34 seconds!
Significantly less electricity is required to run the optimised script and this has removed the need to invest in a dedicated server that would be left running for days. Quoting Gary:
“We are really excited about the improvement in speed and the reduced carbon footprint of the re-designed code. With our existing datasets it's great to results in minutes that previously took weeks to obtain, but this actually opens up new avenues of research now that we can start looking at millions of genetic markers across whole genomes rather than focussing on just a few tens of thousands at a time.”
BioSimSpace is an interoperability framework for building robust workflows for biomolecular simulations. The project started via an EPSRC Software Flagship grant in partnership with CCPBioSim. The aim was to develop interoperability wrappers around the wealth of simulation software that is used by the biomodelling community. BioSimSpace is written in a combination of C++ and Python. We developed the software over a period of two years, creating a robust design, full documentation, full test suites and a fully open software development philosophy. BioSimSpace is now in use by several research groups, including groups in industry. BioSimSpace is taught via workshops run by CCPBioSim, is published in the Journal of Open Source Software, and has received follow-on funding via an EPSRC Impact Accleration Award in partnership with UCB.
Cluster in the Cloud
The Bristol RSE group is often asked to support research groups who want to use cloud computing to accelerate their research. Cloud computing interfaces are very challenging to work with, meaning that often researchers can be daunted by interfaces that are completely unfamiliar to them. To help, Bristol RSEs have developed cluster-in-the-cloud. This builds on-demand, scalable, fully heterogeneous HPC clusters on AWS, Google Cloud or Oracle Cloud (and could be easily extended to support Microsoft Azure). Cluster in the Cloud has enabled many research groups at Bristol and beyond to make excellent use of cloud resources. Cluster-in-the-cloud uses custom Terraform and Ansible scripts to provision an on-demand Slurm cluster, with a small amount of custom Python used to switch compute nodes on or off based on submitted jobs. The clusters have full monitoring via Grafana, software installation via Spack, EasyBuild and/or Singularity, have a web-based administration gui, and support running Jupyter notebooks and running parallel jobs via Dask.