Reproducible Health Data Science
The development of technology to generate and analyse extremely large health datasets provides unprecedented opportunity to improve public health and clinical medical practices. Unfortunately, the complexity of these datasets accompanied by poor data science practices has contributed to a reproducibility crisis. The course will introduce participants to essential practices, skills and tools available for increasing reproducibility of their research. Each practice, skill and tool will be introduced in lectures delivered by active data scientists at the MRC Integrative Epidemiology Unit (IEU) and the University of Bristol with a strong track record of both impactful and reproducible research.
Date | 12 - 14 March 2025 |
---|---|
Fee | £0 (pilot course) |
Format | Online |
Audience | Internal University of Bristol only, pilot course (prerequisites apply) |
In our 2024-2025 programme this course ran as a pilot, open to University of Bristol staff and PGR students only. We intend that this course will be available to all in our 2025-2026 programme. Find out more about pilot courses.
Course profile
This course aims to introduce essential, up-to-date practices, skills and tools to improve reproducibility in health data science.
Please click on the sections below for more information.
Structure
Over 3 days, this online course will consist of a variety of learning activities set by tutors. Skills and tool use will be demonstrated by a mix of live and pre recorded videos and detailed online instructions. Participants will put theory into practice by applying what they learn within one of their personal research projects. Data will be provided for practice to participants who do not have or choose not to work on a personal project. All teaching will be conducted online using Blackboard and Blackboard Collaborate.
Intended Learning Objectives
By the end of the course participants should be able to:
- discuss the importance of reproducibility in health data science;
- enumerate and discuss the many pitfalls affecting reproducibility;
- understand how specific features of popular software development environments support the creation of reproducible scripts;
- collaboratively develop, share and test analysis packages and pipelines;
- evaluate the portability of packages and pipelines; and
- handle and share data with minimum risk of loss or security violations.
Target audience
The course is intended for those who analyse health data and would like to learn how to improve the reproducibility of their work. It is an introductory to intermediate course. It does not include statistical instruction.
Outline
This course will cover:
- the importance of and pitfalls preventing reproducibility;
- transparent organization of project files and data;
- software development environments for scripting;
- collaborative software development;
- literate programming;
- dependency management;
- pipeline development;
- software packaging;
- safe data management; and
- software and data version control.
Teaching staff
All tutors are active researchers leading projects that involve the analysis of large health datasets in the MRC Integrative Epidemiology Unit (IEU) at the University of Bristol. Several have academic backgrounds in data science or related fields.
Prerequisites
To make sure the course is suitable for you and you will benefit from attending, please ensure you meet the following prerequisites before booking:
Eligibility |
This course is available to University of Bristol staff and postgraduate researchers only. Candidates must be able to fully attend the course and provide feedback. |
---|---|
Conditions |
Pilot courses are extremely popular and all live sessions must be attended in full. You should only book onto this course if you are able to commit to attending in full and have time to provide detailed feedback. Attendance is monitored. Failure to attend in full, without a valid reason, will result in your access to pilot course materials being rescinded and you will not be permitted to attend any further pilot courses within the same academic year. |
Knowledge |
Participants should have some experience developing scripts and analysing data. Experience with Linux-based high-performance compute clusters will be an asset but is not required. Expertise is not required in any specific programming language, however demonstrations will tend to focus on R and Python. |
Software |
Participants will carry out practical activities on their own computers. Software installation instructions will be provided prior to the course along with a short drop-in session for advice. |
Recommendation |
Access to two screens will be useful for practice sessions where one screen can be used to view instructions and the other to carry out instructions and view outputs. |
Bookings
Before booking this course, please make sure you read the information provided above about the target audience and prerequisites. It is important that you have access to the relevant IT resources needed for the course and meet the knowledge prerequisites to ensure you can get the most from the course.
We do not charge fees for pilot courses, nor do they count against your allocation of free course places. However, in return we ask that you take the time to provide full and thorough feedback so we can effectively evaluate the success of the course.
Pilot courses are extremely popular and all live sessions must be attended in full. You should only book onto this course if you are able to commit to attending in full and have time to provide detailed feedback. Attendance is monitored. Failure to attend in full, without a valid reason, will result in your access to pilot course materials being rescinded and you will not be permitted to attend any further pilot courses within the same academic year.
Bookings are taken via our online booking system, for which you must register an account. To check if you are eligible for free or discounted courses please see our fees and voucher packs page. All bookings are subject to our terms & conditions, which can be read in full How to pay your short course fees..
For help and support with booking a course refer to our contact us, here or feel free to booking information page directly. For available payment options please see: FAQs
Course materials
Participants are granted access to our virtual learning platform (Blackboard) 1 to 2 weeks in advance of the course. This allows time for any pre-course work to be completed and to familiarise with the platform.
To gain the most from the course, we recommend that you attend in full and participate in all interactive components. We endeavour to record all live lecture sessions and upload these to the online learning environment within 24 hours. This allows course participants to review these sessions at leisure and revisit them multiple times. Please note that we do not record breakout sessions.
All course participants retain access to the online learning materials and recordings for 3 months after the course.
Please note that this is a pilot course and therefore no Materials & Recordings (UoB only) option is available.
Testimonials
100% of attendees recommend this course*.
*Attendee feedback from March 2025.
Here is a sample of feedback from the last run of the course:
“The range of topics taught in the course covered most of the basic needs do [sic] perform reproducible data science. All the topics were well explained by the tutors, they also spoke about their experiences and practices which was helpful. Most importantly, practical material provided in the course was well built and provided a realistic scenario to work with." - course feedback, March 2025
“Walkthrough videos were super helpful, as were the explanations [sic] of Snakemate and Containers and Packages, including when they are useful. The github repo, wiki and branches for different stages are were also great for the practicals. Troubleshooping also v. helpful." - course feedback, March 2025
“The course offered rich content, covering nearly every aspect of achieving reproducibility. The group of tutors was exceptional, and it was helpful to know which staff members had extensive experience in each aspect. The recorded practical videos were incredibly useful, ensuring I didn’t miss any commands. Thank you, Gib! Separating the troubleshooting room facilitated a balance between attendees with different skill levels." - course feedback, March 2025
“The contents of this course are very interesting and helpful - and it is clear throughout that everyone is very knowledgeable on the subject matter." - course feedback, March 2025
“Realy [sic] great content- ie [sic] I can imagine almost all of the tools covered being useful to our team. The wiki was excellent too- really essential to have this to follow easily when one's brain is feeling fried by new learning. The tutors were really great and did an amazing job." - course feedback, March 2025
“Overall, it was a good course, especially beneficial for those interested in coding. All the tutors were amazing." - course feedback, March 2025
“The subject area was really well introduced giving a good background for the importance of reproducable [sic] health data science. The speakers were clear and there were plenty of moderators at hand for the break out sessions for technical support. They answered questions and helped the course attendees." - course feedback, March 2025
“There was a lot of new information for me, and the course highlighted good practices for a professional data scientist. I also gained valuable insights simply by observing the tutors as they coded." - course feedback, March 2025
Bookings for this course have now closed
The course offered rich content, covering nearly every aspect of achieving reproducibility. The group of tutors was exceptional, and it was helpful to know which staff members had extensive experience in each aspect. The recorded practical videos were incredibly useful, ensuring I didn’t miss any commands. Thank you, Gib! Separating the troubleshooting room facilitated a balance between attendees with different skill levels.
You may also like:
Questions?
Explore our comprehensive FAQ pages or contact us for help and support.
Find out more about: