Machine Learning with Omics Data
Health research is increasingly turning to high-throughput molecular datasets (also known as ‘omic’ datasets) to discover novel biomarkers of disease risk and outcome. Unfortunately, the size and complexity of these datasets makes them difficult to manage and prone to many pitfalls. In this course, we introduce you to the latest approaches from data science for interpreting and extracting useful and reliable biomarkers from these challenging datasets.
| Dates | 10 - 12 June 2026 |
|---|---|
| Fee | £750 |
| Format | Online |
| Audience | Open to all applicants (prerequisites apply) |
Course profile
This course aims provide an overview of the principles and methods of epidemiology and data science that are relevant to high-throughput omic studies and provide students with the knowledge and skills necessary to design and utilize population-based omic studies to gain insight and to derive robust biomarkers of exposures and health outcomes.
Please click on the sections below for more information.
This 3-day course will be online and consist of live lectures followed by practical sessions using R via Posit Cloud, consequently attendees do not need to install R on their computers.
By the end of the course participants should be able to:
- discuss the specific contributions of different omic data types for understanding and improving human health;
- choose and apply appropriate statistical and machine learning methods for interrogating omic data;
- derive reliable omic biomarkers for indexing exposure and predicting health outcomes;
- evaluate biomarker performance in terms of metrics appropriate to the context in which the biomarker will be used; and
- mitigate the ethical challenges of developing, interpreting and applying molecular biomarkers.
This course is intended for individuals engaged in population-based studies who wish to use omic datasets (e.g. epigenomic, transcriptomic, proteomic, metabolomic or genomic) to gain biological insights and to derive biomarkers of exposure and/or health outcomes. Attendees may have a background in epidemiology, genetics, statistics, public health or a clinical speciality. A basic knowledge of epidemiology is required and some understanding of molecular epidemiology terminology and machine learning would be advantageous. Practical knowledge of R is required as students will be processing large omic datasets in practical sessions.
Please note that this course attracts a highly multi-disciplinary audience. We do our utmost to accommodate this and ask that if in any doubt, prospective participants enquire prior to booking to check that the course is targeted at the right level for their needs.
The course will cover:
- examples of published omic analyses and models for epidemiological and medical applications;
- statistical methods for preprocessing, discovering patterns and testing associations in omic datasets;
- interpreting the biological relevance of omic patterns and associations;
- estimating the heritability and proportion of variation explained by omic data;
- approaches from machine learning for deriving reliable omic biomarkers for indexing exposures and predicting health outcomes;
- application and interpretation of appropriate metrics for evaluating biomarker performance; and
- ethical challenges of developing, interpreting and applying molecular biomarkers.
Dr Paul Yousefi is a data scientist who applies emerging methods in machine learning and statistical prediction to develop multi-dimensional genomic biomarkers of health risk factors, patterns of exposure, and emerging disease phenotypes.
Dr Matthew Suderman is a bioinformatician who specialises in the handling and integrated analysis of large molecular datasets for the discovery of biomarkers of disease risk and outcomes.
Dr Anza Shakeel is an expert in deep learning whose research focuses on using approaches from artificial intelligence to leverage large molecular and imaging datasets to develop predictors of health outcomes.
Dr Sarah Watkins is an epigenetic epidemiologist interested in how our environment shapes our health. Her research currently focuses on how environmental exposures like smoking and adversity influence DNA methylation and health outcomes and how structural racism can lead to health inequities.
To make sure the course is suitable for you and you will benefit from attending, please ensure you meet the following prerequisites before booking:
| Knowledge | You should be very familiar with the topics presented in our Molecular Epidemiology short course. This includes practical knowledge of using R to analyse high-throughput molecular data. It is recommended that you should have either completed the Molecular Epidemiology short course in this programme or have previous experience performing an omic-wide association studies, e.g. GWAS, EWAS, PWAS. |
|---|---|
| Recommendation | Access to two screens will be useful for practical sessions where one screen can be used to view instructions and the other to carry out instructions and view outputs. |
Before booking this course, please make sure you read the information provided above about the target audience and prerequisites. It is important that you have access to the relevant IT resources needed for the course and meet the knowledge prerequisites to ensure you can get the most from the course.
Bookings are taken via our online booking system, for which you must register an account. To check if you are eligible for free or discounted courses please see our fees and voucher packs page. All bookings are subject to our terms & conditions, which can be read in full here.
For help and support with booking a course refer to our booking information page, FAQs or feel free to contact us directly. For available payment options please see: How to pay your short course fees.
Participants are granted access to our virtual learning platform (Blackboard Ultra) 1 to 2 weeks in advance of the course. This allows time for any pre-course work to be completed and to familiarise with the platform.
To gain the most from the course, we recommend that you attend in full and participate in all interactive components. We endeavour to record all live lecture sessions and upload these to the online learning environment within 24 hours. This allows course participants to review these sessions at leisure and revisit them multiple times. Please note that we do not record breakout sessions.
All course participants retain access to the online learning materials and recordings for 5 months after the course.
University of Bristol staff and postgraduate students who do not wish to attend the full course may instead register for access to the 'Materials & Recordings' version of this course: Further information and bookings.
100% of attendees recommend this course*.
*Attendee feedback from 2025.
Here is a sample of feedback from the last run of this course:
“There was a lot of detail fitted into the course given the time allowed and it was helpful to see real world case studies and examples of ML data applied to a range of research areas." - Course feedback, June 2025
“The lecturers were all very knowledgable and enthusiastic. Practicals were very well written and paced at the right level, all were relevant to the lectures, and all packages and data worked. The range of topics discussed was really fascinating and I really appreciated starting at a basic level and working up. All questions were answered in detail and in real time which was fantastic. Thanks for using R!" - Course feedback, June 2025
“good explanations, nice succinct well designed course." - Course feedback, June 2025
“The course structure (i.e., the mix of lectures and practicals) was really well balanced and the lecturers were clearly very informed and knowledgeable about their respective areas. I also thought they did a really good job of encouraging questions and communicating what was expected / timings of the day. The lecturers were really clear in answering questions too, thank you!" - Course feedback, June 2025
“Practicals were excellent - well explained and well structured. The code was great - stopifnot() will change my life! I appreciated having the answers available, too, so that coding issues don't stop you from missing the point of the exercises and answering the questions." - Course feedback, June 2025
“The course was structured really nicely. I liked that the sessions were split between lectures and practicals, with lots of breaks in between. All of the course instructors were brilliant." - Course feedback, June 2025
“I took lots of information around the course and built on my basic understanding of these concepts - the level at which this course was pitched and how it built up to more complex topics was really great and well managed. Lots of ideas for possible grants and future work!" - Course feedback, June 2025