Search/filter projects:
Professor Marcus Munafò (lead), Dr Rebecca Richmond, Professor Kate Tilling
Smoking is known to be a causal risk factor for a range of cancers, but the role of smoking in disease progression (rather than onset) is less well known. In particular, understanding whether smoking is a causal risk factor for progression outcomes is complicated when it is also a causal risk factor for disease onset. This project will apply newly-developed methods that can be applied within a Mendelian randomisation framework to understand the causal effects of smoking on cancer progression. This work has the potential to inform policy and intervention work that focuses on smoking among patients with a diagnosis of cancer. In addition, it is not clear to what extent nicotine or other constituents of tobacco smoke may be causal, if smoking is shown to influence disease progression. Although nicotine is not carcinogenic itself, it has been shown to promote tumour growth in model systems. Multivariable Mendelian randomisation allows the independent contribution of multiple exposures (e.g., smoking vs nicotine) to be assessed, and will help to shed light on the role of nicotine in cancer progression. This work will inform policy around, for example, advice to use e-cigarettes as a smoking cessation tools among individuals with a diagnosis of cancer.
We aim to understand the causal effects of the nicotine and non-nicotine constituents of smoking on cancer incidence and cancer progression, and whether smoking cessation improves progression outcomes among those with a diagnosis of cancer. While the effects of smoking on cancer risk are well known for a wide range of cancers, the role of nicotine per se is less well understood. This is particularly important given the rapid growth in popularity of e-cigarettes, where the primary exposure is nicotine in the absence of the other elements of pyrolysed tobacco. In addition, little is known about the effects of smoking on cancer progression, and the potential benefits of cessation.
The project will use two broad methods. First, to explore the independent effects of the nicotine and non-nicotine constituents of smoking we will use multivariable Mendelian randomisation (MR), using genetic instruments for circulating nicotine levels and heaviness of smoking, stratified on smoking status (ever vs never, and current vs former vs never). We will examine a range of cancers (e.g., lung, head and neck), primarily in relation to incidence but also, where possible, in relation to progression. Multivariable MR has proved tractable with highly correlated exposures, such as general cognitive ability and educational attainment. Second, we will explore the Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use.effect of smoking cessation on cancer progression outcomes, using genetic variants associated with smoking cessation in a recent GWAS and restricting our analyses to smokers who have a diagnosis of cancer. However, this raises potential issues of collider bias (e.g., if heaviness of smoking influences smoking status, via an effect on cessation). Case-only designs such as this present methodological challenges, whereby uncorrelated causes of progression appear correlated due to selecting on incidence status. This becomes even more complicated when factors that influence incidence also influence progression due to shared pathways (for example, it is plausible that smoking influence lung cancer risk, as well as lung cancer prognosis), in which case currently-developed methods do not remove the collider bias. We will examine this through multiple approaches to stratification (as described above) and adjustment of genetic associations for this bias. We (Tilling) have recently developed a method for analysing data in case-only designs by which genetic associations are adjusted for collider bias by identifying genetic variants which only affect incidence. Our method can then obtain unbiased estimates of the causal effects in case-only designs. We will also explore the possibility obtaining data from populations with different rates of smoking, or where smoking rates differ substantially between groups (e.g., men and women), as an alternative to stratifying on smoking status.
Liu et al. (2019). Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nature Genetics, 51, 237-244.
Mahmoud et al. (2020). Slope-Hunter: A robust method for index-event bias correction in genome-wide association studies of subsequent traits. bioRxiv.
Sanderson et al. (2019). An examination of multivariable Mendelian randomisation in the single-sample and two-sample summary data settings. International Journal of Epidemiology, 48, 713-727.
Professor Caroline Relton (lead), Dr Matthew Suderman,
Recent findings suggest that blood plasma protein abundances may be useful biomarkers of cancer risk and may even mediate risk. These findings are based on investigations of small panels of candidate proteins, in most cases a single protein. Recent techological advances have permitted several human studies with cancer incidence information to measure hundreds of proteins in plasma samples. We propose to uses these protein profiles to investigate the relationship between protein abundance and cancer risk.
1. Test associations between blood plasma protein abundance and cancer incidence.
2. Evaluate evidence for causal relationships between protein abundance and cancer risk.
Associations between protein abundance and cancer incidence will be tested using regression models in several datasets, e.g. HUNT (n=1990, prostate; n=140, lung; n=1523, breast; n=1268, colorectal), EPIC (n=300, head and neck), and LC3 (n=5323, lung).
Causal links will be evaluated using two-sample Mendelian randomization (Davey Smith and Hemani, 2014). This will utilize summary statistics of genome-wide association studies (GWAS) of protein abundance and of cancer incidence. Large GWAS have been published for most common cancers, however, GWAS of protein abundance have been limited. One of the largest studies used genetic and protein profiles from 6,800 individuals (Yao et al. 2018). However, a recent analysis curated 2113 pQTLs associated with 1699 proteins from multiple studies (Zheng et al. 2019).
Davey Smith and Hemani. “Mendelian randomization: genetic anchors for causal inference in epidemiological studies.” Human molecular genetics vol. 23,R1 (2014): R89-98.
Yao et al. “Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease.” Nature communications vol. 9,1 3268. 15 Aug. 2018.
Zheng et al. “Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.” 2019, bioRxiv 627398.
Professor Caroline Relton (lead), Dr Matthew Suderman, Dr Ben Brumpton and Professor Linn Strand
Recent evidence suggests that plasma protein abundances are informative of early cancer risk. Other evidence indicates that DNA methylation models of protein abundance may provide information about risk unavailable from their target proteins (Lu et al. 2019). We propose to use protein abundance profiles covering hundreds of proteins with matching DNA methylation profiles to generate DNA methylation models of protein abundance and then apply these models to DNA methylation profiles in studies with cancer incidence to investigate the relationship of these models with cancer risk.
1. Systematically construct DNA methylation models of protein abundance in peripheral blood.
2. Evaluate performance of these models for early cancer detection.
Epigenome-wide association studies (EWAS) of protein abundance will be applied to datasets including protein and DNA methylation profiles derived from blood samples. Machine learning approaches such as penalised regression and Random Forests will be used to derive DNA methylation models of protein abundance. Models with sufficiently strong associations with target protein levels will then be applied to DNA methylation profiles in other datasets to evaluate ability to detect incident cancer.
Protein profiles have been generated from 3500 blood samples from participants the Nord-Trøndelag Health Study (The HUNT Study). Each profile includes coverage of 1133 proteins. DNA methylation profile generation using the Illumina EPIC BeadChip is scheduled to begin in a few months.
Cancer incidence will be tested in datasets with DNA methylation profiles and cancer incidence information, e.g. Head & Neck 5000 (n=409, head and neck), ProtecT (n=850, prostate), NOWAC (n=132, lung), MCCS (n=367, lung), NSHDS (n=234, lung), EPIC-Italy (n=185, lung).
Lu, et al. “DNA methylation GrimAge strongly predicts lifespan and healthspan.” Aging vol. 11,2 (2019): 303-327.
Dr Pau Erola (lead), Professor Tom Gaunt, Dr Jordi Camps
The development of single nucleotide polymorphism (SNP) arrays and next-generation sequencing has provided the opportunity to systematically identify regions of uniparental disomy (UPD), or copy-neutral loss of heterozygosity, in cancer. In [1] we showed a non-random, cancer-specific landscape of somatically acquired UPD in cancer, that may lead to the homozygosity of mutations that inactivate tumor suppressor genes or activate oncogenic mutations. The risk of these early mutations, as a first hit, may be determined by genetic variants in the general population [2], and the two-hit events will be subject to positive selection if they increase the tumorigenic proliferative drive or drug resistance [3]. The understanding of these aberrations is crucial for the development of new therapies that are able to restrain tumor evolution. Here we propose to study the genetic variants of risk of genomic instability focusing on those regions recurrently affected by mutations and acquired UPD.
In order to study the causative role of UPD in cancer aetiology our aims will be:
1. Identify SNPs as proxy of carcinogenic agents that have a causal effect on the genomic instability across different cancer types, including mutations and recurrent regions of copy-number variation and loss of heterozygosity.
2. Identify the mutations and copy-number alterations subject to positive selection that have a causal effect on cancer evolution.
This is a bioinformatics project involving programming, use of data bases and high-performance computing. Association data between SNP and exposures will be obtained from UK Biobank, and cancer data will be obtained from The Cancer Genome Atlas (TCGA) and other cancer consortia to run a genome-wide association study (GWAS) between SNPs and events of genomic aberration. Then, we will use two-step Mendelian Randomization (MR) to test the causal effect of suspected carcinogenic risk factors on cancer. The robustness of the results will be tested systematically by pathway analysis and literature mining using EpiGraphDB.
[1] Torabi, Erola, et al. International Journal of Cancer 144.3 (2019): 513-524.
[2] Robles-Esponiza, Nature Communications 12.7 (2016): 12064.
[3] Jamal-Hanjani, et al. New England Journal of Medicine 376.22 (2017): 2109-2121.
Dr. Josine Min (lead), Dr. Haeran Cho, Dr. Claire Gormley, School of Mathematics and Statistics, University College Dublin Prof. Jonathan Rougier, Rougier Consulting Ltd Prof. Kate Tilling, MRC Integrative Epidemiology Unit
This studentship will provide cross-disciplinary training in state-of-the-art statistical and genomic epidemiological approaches (under the supervision of Dr. Josine Min at the Medical Research Council Integrative Epidemiology Unit and Dr. Haeran Cho at the School of Mathematics) to develop novel statistical methods for the analysis of high-dimensional epigenetic data in large-scale epidemiological datasets.
Epigenome-wide association studies (EWAS) aim to identify DNA methylation (DNAm) sites associated with phenotypes of interest (e.g., disease status, cholesterol levels, smoking history, body mass index and age, to name a few), where the challenge lies in the robust identification of DNAm differences and the interpretation of the results. In current EWAS approaches, each DNAm site is tested separately for association with a trait, exposure, or biological condition of interest, which raises the problem of controlling for genome-wide multiplicity. Clustering of the DNAm sites can significantly reduce the dimensionality of the subsequent EWAS. For example, a DNAm site that has a hypo-methylated distribution in a smoking cohort, but hyper-methylated distribution in the non-smoking cohort would be deemed a potential candidate for differential DNAm in association with smoking. It is commonly assumed that the distributional behaviour at DNAm sites can be characterised by a small number of behaviours, such as either `hypo- or hyper-methylated'(1). In existing approaches for EWAS, DNAm sites are often deemed to be differentially methylated if their mean DNAm levels differ. In this project, we aim at uncovering differentially methylated sites by exploring the distributional behaviour of DNAm at each DNAm site, similar in vein to Lock and Dunson(2).
The overall aim of this PhD is to develop a novel statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). Additional method development would depend on the candidate's research interests but could include development of novel methods to account for cell heterogeneity or novel methods for imputation of epigenetic features.
This project will equip the student with skills in the analysis of high-dimensional data, and development of scalable clustering techniques and algorithms, training in the analysis and interpretation of epigenetic data. The student will have the opportunity to spend a period circa 6 months, most likely at the start of Year 3, as a visiting student working in the School of Mathematics and Statistics in University College Dublin with Dr. Gormley. They will benefit from being housed with and exposed to the cohort and activities of PhD students there as part of the Science Foundation Ireland Centre for Research Training in Foundations of Data Science www.data-science.ie, @data_science_ie), and also to the PhD students in the national Insight Center for Data Analytics.
We propose to find a factorisation of DNAm profiles and to develop a novel generative statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). We will adopt a non-parametric approach, whereby structural assumptions that are natural to the application DNAm data can be imposed. At the University of Bristol, we have access to a large dataset ARIES(4) with DNAm of 1000 children measured at 3 time points and their mothers at two time points offering the student an excellent platform for developing these methods.
1 Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9, 365, doi:10.1186/1471-2105-9-365 (2008).
2 Lock, E. F. & Dunson, D. B. Shared kernel Bayesian screening. Biometrika 102, 829-842, doi:10.1093/biomet/asv032 (2015).
3 Fan, J., Liao, Y. & Mincheva, M. Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 75, doi:10.1111/rssb.12016 (2013).
4 Relton, C. L. et al. Data Resource Profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). Int J Epidemiol 44, 1181-1190, doi:10.1093/ije/dyv072 (2015).
Dr Sarah Lewis (lead), Kostas Tsilidis, Tom Gaunt
Drug development is a costly and long process and many licensed drugs are not used for the purpose for which they were designed due to; more effective drugs being available, being too expensive or having off target effects (side effects). However, there is a pressing need for more effective drugs for cancer treatment and even for cancer prevention. It is possible that many drugs currently developed could be repurposed and used effectively to prevent cancer. This would save the time and money associated with developing new drugs and ultimately benefit those at risk of cancer. For example we have shown using Mendelian randomization that statins which are commonly used to prevent heart disease could also prevent ovarian cancer and prolonged use of non-steroid anti-inflammatory (NSAID) drugs has been shown to reduce the risk of several solid tumours.
The biological pathways which have been implicated in prostate cancer development such as insulin, growth and sex hormone signalling are the key pathways leading to other diseases. There are therefore licensed drugs which interact with target sites within these pathways which could potential be repurposed to prevent or treat prostate cancer.
To use Mendelian randomization to determine whether licensed drugs which target insulin, sex and growth hormone signalling are likely to reduce the risk and progression of prostate cancer
The project will identify licensed drugs which target the insulin signalling, growth hormone or sex hormone pathways. The aim is to firstly identify the target sites of these drugs and any off target effects, then identify single nucleotide polymorphisms which mimic the effects of these drugs and design a two sample MR analysis to determine what the effect of treatment would be on prostate cancer risk. For this analysis data from a genome wide association study of the PRACTICAL consortium which consists of over 70,000 cases and 60,000 controls will be used. The MR analysis will consist of an inverse-variance weighted (IVW) analysis followed by several sensitivity analyses such as MR-PRESSO to test the assumptions of the MR analysis. The outcomes will be any prostate cancer, advanced prostate cancer and prostate cancer specific mortality. The study could analyse each drug target site separately and then match the results to the drug profiles to determine which, if any drugs would exert the largest effect on cancer risk or progression.
Dr Kaitlin Wade (lead), Dr Tim Robinson, Dr Lindsay Hall Dr Stephen Robinson
Evidence from humans and model organisms supports relationships between the gut microbiome – a complex system of microorganisms aiding digestion, providing protection against pathogens and creating essential metabolites – and cancer aetiology[1,2]. Human studies in this context have been largely observational and suffer confounding, biases and reverse causation, limiting their ability to offer convincing causal evidence[3,4]. The integration of human genetics within population health sciences has proved successful in facilitating improved causal inference (e.g., Mendelian randomization [MR]) and characterising inherited susceptibility to disease[5]. The former exploits properties of human genetic variation to estimate the causal effect of a trait on health outcomes (i.e., the gut microbiome on cancer) and the latter uses aggregate genomic information to assess differential risk for lifecourse health outcomes. The overarching aim of this project is to understand the causal impact of the gut microbiome on cancer incidence and progression.
1. Gather and appraise host microbiome-related genetic variation and apply contemporary causal inference methodologies to challenge the causal role of the gut microbiome in cancer incidence and progression.
2. Examine the likely impact of reverse causation in these relationships and assess the validity of integrating human genetics in understanding the role of the gut microbiome in cancer aetiology.
3. Assess the association between genome-wide predictors of the gut microbiome and cancer risk and progression outside causal framework analyses.
Initial genome-wide association studies (GWASs) have enabled the application of MR to assess the impact of the microbiome on various outcomes[6-13]; however, there is an unmet requirement for careful examination and interpretation of derived causal estimates and host (i.e., human) genetic effects[14]. For this project, epidemiological and causal inference analyses will be performed using available summary-level data from previous GWASs of the gut microbiome and cancer-specific GWASs (e.g., the Breast Cancer Association Consortium [BCAC]) and individual-level data from cohort studies (e.g., the Flemish Gut Flora Project, Avon Longitudinal Study of Parents and Children, and UK Biobank). We will triangulate findings with evidence from epidemiological and basic sciences to interrogate the appropriate implementation of human genetics for appraising causality in the relationship between the gut microbiome and cancer aetiology. There is also scope to undertake a lab-based placement (either as one of the 4-month miniprojects or during the PhD) at the Quadram institute with Dr Lindsay Hall and Dr Stephen Robinson, to follow up findings from analyses focusing on the causal impact of the gut microbiome on, specifically, breast cancer.
1. Goodman B and Gardner H. The Journal of Pathology, 2018. 244(5): p. 667-676.
2. Raza MH, et al. Journal of Cancer Research and Clinical Oncology, 2019. 145(1): p. 49-63.
3. Fritz JV, et al. Microbiome, 2013. 1(1): p. 14.
4. Bik EM. The Yale journal of biology and medicine, 2016. 89(3): p. 363-373.
5. Davey Smith G and Hemani G. Human molecular genetics, 2014. 23(R1): p. R89-R98.
6. Bonder MJ, et al. Nature Genetics, 2016. 48: p. 1407.
7. Wang J, et al. Microbiome, 2018. 6(1): p. 101.
8. Blekhman R, et al. Genome Biology, 2015. 16(1): p. 191.
9. Goodrich JK, et al. Cell, 2014. 159(4): p. 789-799.
10. Goodrich JK, et al. Cell host & microbe, 2016. 19(5): p. 731-743.
11. Turpin W, et al. Nature Genetics, 2016. 48: p.1413.
12. Davenport ER, et al. PloS one, 2015. 10(11): p. e0140301-e0140301.
13. Hughes DA, et al. Accepted in Nature Microbiology, 2020.
14. Wade KH and Hall LJ. Wellcome Open Res 2019, 4:199.
Dr Kathreena Kurian (lead), Professor Richard Martin, Dr Alexandra McAleenan, Dr Siddartha Kar
Brain tumours such as glioma and brain metastases from other primary sites have an overall 5-year survival rate of 20%, in part due to their late detection because their symptoms are so non-specific. At present little is known about the development of brain tumours, and there is an absence of new biomarkers for their detection and drug targets for their treatment. Early diagnosis is key for improving morbidity and mortality in these
patients. The only currently accepted treatment approaches are radiotherapy followed by standard chemotherapy. There is some speculation about a number of potential biomarkers and associated mechanisms of brain tumour development and progression that could offer new prevention and treatment strategies, but it is uncertain whether these associations are causal or reflect bias or confounding. Mendelian randomization (MR) is a statistical methodology that uses genetic variants to test whether observational associations of risk factors, biomarkers, molecular intermediates and new drug targets with diseases and their outcomes represent causal relationships (Davey-Smith & Hemani, 2013).
Aims & Objectives (1) To perform a systematic review and Mendelian randomization and to identify new biomarkers, molecular intermediates and drug targets for early diagnosis and drug targets to prevent brain tumour development and progression (2) To validate novel biomarkers and drug targets in TCGA and other clinically relevant databases, using triangulation (3) To liaise with our translational team to validate novel biomarkers using their data
Methods A search of PubMed articles (systematic review) will be conducted to identify previously published observational studies and meta-analyses describing an association between intermediate mechanisms, potential drug targets and brain tumour incidence. Genetic instruments for the identified biomarkers and drug targets will be prepared by extracting publicly available GWAS summary data from the largest corresponding European study (non-instrumentable risk factors will be excluded). Using two-sample Mendelian randomization, we will use these genetic instruments to appraise the causal relevance of the identified intermediates and drug targets for incidence of brain tumours. We will examine molecular intermediates in an MR framework, and correlate with other laboratory investigation of novel biomarkers we identify using MR. This will be validate in datasets that have already been analysed locally using targeted Next Generation Sequencing and publically available datasets such as TCGS
References 1. Davey Smith G, Hemani G (2014). Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet, 23(R1), R89-R98
Dr. Siddhartha Kar (lead), Prof. Paul Brennan (IARC), , Prof. Tom Gaunt
Cancer is a disease of the genome. Certain changes that are acquired over the course of life in the genomes of healthy cells in the human body (somatic genomic changes) dysregulate the fine balance between cell death and proliferation. These somatic genomic aberrations are the cornerstone of malignant cellular transformation. Targeting somatic genomic changes is fundamental to the practice of precision cancer medicine. We understand that common exposures and cancer risk factors such as ultraviolet light and smoking accelerate the acquisition of these changes. However, little is actually known about how everyday exogeneous and endogenous factors such as diet, obesity, and insulin resistance relate to, and likely drive, carcinogenic changes in the somatic genome. This is because it is difficult to measure lifelong trajectories of the factors retrospectively at cancer diagnosis and expensive to measure them prospectively in large numbers of individuals until some of them develop cancer. Such one-time "snapshot" measures, even where feasible, are prone to bias and confounding. Specific inherited or germline genetic variants have been found to be robustly associated with these exposures or factors. Since genetic variants are allocated at random at conception and fixed thereafter, they are less affected by bias and confounding. The factor-associated variants provide remarkable proxies for the lifetime levels of these factors even in patients in whom the factor itself has not been measured. These variants collected into polygenic scores can serve as instruments to evaluate association between the germline genetically inferred levels of the factor and somatic/tumour molecular features and mechanisms that operate within the cancer.
1. To identify tumour molecular features associated with common exposures or putative cancer risk factors
Genome-wide association studies involving hundreds of thousands of individuals have identified germline variants that are robustly associated with different factors, ranging from body mass index to blood-levels of protein markers. This variation will be leveraged to generate personalised life-course profiles of these factors in cancer patients using germline genotype data. The association of these profiles with tumour gene expression, methylation, copy number, and mutations will then be evaluated at the level of single genes and multi-gene biological pathways in >11,000 tumours that have been subjected to deep germline-somatic molecular and clinical phenotyping in The Cancer Genome Atlas (TCGA) project.
2. To investigate the association between common exposures or putative cancer risk factors and cancer drug sensitivity
Over 1,000 cancer cell lines from the Genomics of Drug Sensitivity in Cancer project have been screened for their response to >450 cancer drugs either approved for use in patients or in development. Germline genotypes from the cell lines will be used to index the factors and the association of each index with therapeutic response assessed.
A key aspect of this project is flexibility in terms of the scientific direction taken with these rich data sets that will be provided to the student. It is envisioned that such flexibility may manifest in various ways such as (but certainly not limited to) encouraging an investigation of cancer risk factors in the context of genetic ancestry and sex (for example, the student may wish to study whether the impact of body mass index on the tumour genome differs by sex or ancestry).
The student will receive exceptional training in the handling and statistical analysis of large-scale, high-dimensional cancer genetic, genomic, transcriptomic, and epigenomic data sets and in the interpretation of findings based on these data sets. The student will apply a range of computational techniques including state-of-the-art Mendelian randomisation methods implemented in MR-Base, polygenic scoring approaches such as LD-Pred, and expression quantitative trait locus analysis using the R package Matrix eQTL. The project seeks to encourage a high-degree of flexibility both in terms of the scientific questions being asked and in terms of the methods being applied and should the student choose to do so, there is ample scope for implementation of artificial intelligence/machine learning-based methods in these data sets, etc. Relevant training in these methods will be provided. It is envisioned that the work will lead to multiple high-profile and highly interdisciplinary publications that will be led by the student, providing an excellent foundation for future scientific leadership.
The Cancer Genome Atlas project: Ding, L. et al. Cell 173, 305-320.e10 (2018).
The Genomics of Drug Sensitivity in Cancer project: Iorio, F. et al. Cell 166, 740–754 (2016).
Dr Anya Skatova (lead), Dr Andy Skinner , Prof Tom Gaunt, Prof Richard Martin
Shopping history records, collected via purchases tracked on loyalty cards, can provide a new perspective on lifestyle choices and behaviours and how these relate to health outcomes such as cancer. Shopping histories can provide information, which is otherwise difficult to measure such as granular, population level, objective data on lifestyle behaviours and risk factors (e.g., smoking, alcohol consumption) that can be tracked longitudinally. However, shopping history data also have inherent biases. For example, despite providing details on purchasing habits and basic individual characteristics, patterns in the data could be explained by other factors (e.g., the gap between purchase and consumption). Reliability of health information that is derived from shopping history data can be assessed through integrating these data with detailed self-reports of behaviour collected through Ecological Momentary Assessment (EMA). This work will improve detection of cancer risk as well assess validity of integrated data sources in risk prediction.
The overall aim of this PhD is to integrate supermarket loyalty cards data with EMA data and conventional epidemiological measures (eg questionnaires, biomarkers, etc) in Avon Longitudinal Study of Parents and Children (ALSPAC) to predict risk factors for cancer. The innovative aspect is the use of transactional and EMA data, which provide higher-density time-series data with different biases from conventional questionnaire/interview data. The ability to predict risk factors using such data could produce novel insights of early cancer symptoms and associated consumption patterns.
Identify patterns in standalone shopping history data that can be reflective of consumption association with known risks of cancer (Years 1 – 2). Collect EMA data on behaviours related to known risks of cancer using wearable technology (e.g. smartwatches) (Year 2). Use statistical methods (e.g., linear and logistic regression) to validate shopping histories patterns through EMA and conventional self-report/biomedical data in ALSPAC (Years 2-3). Use statistical and machine learning methods to predict cancer risk factors in the ALSPAC dataset in a sample of thousands of ALSPAC participants as well as standalone supermarket loyalty cards data in population-wide sample of millions of supermarket customers (Years 2-4)
.
Dr Anya Skatova (lead), Prof Deborah Lawlor ,
Shopping history records collected by supermarkets contain population level health information which could be missing from traditional health research data such as medical records. For example, shopping transactions can provide granular and objective data on under/unreported behaviours and outcomes in reproductive health domain – related to pain and weight management, vitamins consumption, infant feeding, etc - that can be tracked longitudinally. Combining shopping history datasets with epidemiological methods has potential for health research and might improve diagnosis, disease prevention and planning of interventions.
The aim of the PhD is to explore the potential of shopping history data to identify key reproductive events and lifestyle choices around these in real time. The specific focus of the PhD will be developed by the student, with potential objectives including: (1) determining the accuracy of shopping history to determine one or more reproductive events, such as conception, pregnancy, breastfeeding or parenthood; (2) whether shopping histories can identify lifestyle changes around these events, such as pre-conception, pregnancy and breastfeeding related changes in diet; (3) the extent to which shopping histories enhance repeat data collected in cohort studies, for example, shopping histories with data in real time might be able to pinpoint the timing of events such as planning a pregnancy and conception, whereas cohort data collected from movement sensors over periods that coincide with the timing of these events might better identify changes in physical activity and sleep patterns. The PhD will work with standalone population level supermarket shopping histories data, as well as a subset of shopping histories data linked into Avon Longitudinal Study of Parents and Children (ALSPAC).
The student will mainly work with shopping histories data of a large UK health and beauty retailer, both standalone (>12.5m customers, >1.5 billion transactions) and linked into ALSPAC (for ~1,500 index ALPSAC participants). There is a scope for additional new quantitative data collection with ALSPAC participants where it is needed to meet research aims of the PhD project.
Shopping histories data will be used first to identify a reproductive life event of interest (e.g., pregnancy) and a time window associated with it. Products that are bought during this time window will be then explored. This will allow to identify other behaviours (e.g., pain management, fertility issues) and health outcomes (e.g., miscarriages) associated with this life event. Those behaviours and outcomes will be then validated through the contextual variables using the data available in ALSPAC (and new data collected through surveys) related to causes and consequences of the life event. The student will be expected to explore the structure of the repeat shopping data and identify appropriate methods for analysing those data. For example, repeat purchasing of sanitary products (indicative of menstruation) which change over time might be analysed by multilevel models or structural equations depending on the structure of the data and the specific research question.
This is a data intensive quantitative PhD. The successful candidate would be expected to have had experience of statistical analysis in their first degree, be competent in handling complex large-scale data and eager to learn new quantitative methods and/or about new topic areas in a multidisciplinary team. Depending on their previous experience the successful candidate will obtain training in epidemiology, survey design and data collection, advanced statistical methods, and data science, including the ethics and governance and management and use of data, through the completion of the research project and through postgraduate short courses.
Josine Min (lead), Caroline Relton, Jonathan Mill (University of Exeter Medical School) Eilis Hannon (University of Exeter Medical School) The student will also collaborate with Dr. Gibran Hemani (University of Bristol), Prof. Tom Gaunt (University of Bristol), Prof. Bastiaan Heijmans (LUMC, The Netherlands) and Dr. Jordana Bell (KCL, UK).
DNA methylation is an epigenetic biomarker that has been shown to reflect lifestyle and biological factors (smoking, alcohol use, chronological age). To date the majority of studies used to link behavioral phenotypes such as cigarette smoking, and alcohol use to health outcomes typically employ self-reported questionnaire data. Multiple DNA methylation (DNAm) sites are strongly associated with (behavioural) traits. DNAm derived scores have been used to predict (or proxy for) these traits providing greater precision and biological proximity than self-reported measures. The DNAm derived smoking score is a widely used biomarker of lifetime exposure to tobacco smoke and may explain the molecular mechanism of the long-term risk of diseases following smoking cessation. There is growing interest in conducting genome-wide association studies (GWAS) and Mendelian randomization (MR) analysis on DNAm scores to identify novel genetic and causal factors influencing behavioural traits. To date, several GWAS on DNAm derived scores of aging have been published (Lu et al. 2018, McCartney et al. 2021). The many benefits to identify novel loci and biological pathways for other phenotypes have yet to be gained. This studentship will provide cross-disciplinary training in state-of-the-art genetic and genomic epidemiological approaches (under the supervision of Dr. Josine Min and Prof Caroline Relton at the Medical Research Council Integrative Epidemiology Unit at the University of Bristol and Prof Jonathan Mill and Dr. Eilis Hannon at the University of Exeter Medical School) to address questions about the molecular mechanism underlying established disease risk factors. The student will combine epigenetic, genetic and causal inference analyses in large-scale epidemiological datasets.
The overall aim of this PhD is to identify genetic variants and biological pathways associated with disease risk factors using DNAm scores. The specific risk factors/diseases for this project would depend on the candidate's research interests, but could include cell counts, smoking or alcohol use. The Genetics of DNA Methylation Consortium (GoDMC; http://www.godmc.org.uk/) has collected genetic and DNAm data across multiple cohorts offering the student an excellent platform for these analyses.
Methods:
1) Novel methodology can be used (and potentially developed) to construct DNAm scores on disease risk factors
2) GWAS on DNAm derived phenotype datasets will be conducted followed by meta-analyses. There will be several challenges with this type of analysis including heterogeneity of datasets in age, sex and tissue type.
3) To understand what aspect of the phenotype is captured by the DNAm phenotype, GWAS meta-analysis results will be compared to GWA results of detailed (self-reported) phenotypes (eg. in UK Biobank) and methylation quantitative loci from blood and brain.
4) MR analysis will be used to investigate causal relationships between DNAm derived measures and self-reported measures and other diseases/risk factors.
5) The heritability component of DNAm derived phenotypes will be estimated.
Lu AT, Xue L, Salfati EL, et al. GWAS of epigenetic aging rates in blood reveals a critical role for TERT. Nat Commun. 2018;9(1):387. Published 2018 Jan 26. doi:10.1038/s41467-017-02697-5
McCartney DL, Min JL, Richmond RC, et al. Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging. Genome Biol. 2021;22(1):194. Published 2021 Jun 29. doi:10.1186/s13059-021-02398-9
Dr Josine Min (lead), Dr. Haeran Cho, Prof. Kate Tilling Prof. Claire Gromley
This studentship will provide cross-disciplinary training in state-of-the-art statistical and genomic epidemiological approaches (under the supervision of Dr. Josine Min at the Medical Research Council Integrative Epidemiology Unit and Dr. Haeran Cho at the School of Mathematics) to develop novel statistical methods for the analysis of high-dimensional epigenetic data in large-scale epidemiological datasets.
Epigenome-wide association studies (EWAS) aim to identify DNA methylation sites associated with phenotypes of interest (e.g., disease status, cholesterol levels, smoking history, body mass index and age, to name a few), where the challenge lies in the robust identification of DNA methylation differences and the interpretation of the results. In current EWAS approaches, each DNAm site is tested separately for association with a trait, exposure, or biological condition of interest, which raises the problem of controlling for genome-wide multiplicity. Clustering of the DNAm sites can significantly reduce the dimensionality of the subsequent EWAS. For example, a DNAm site that has a hypo-methylated distribution in a smoking cohort, but hyper-methylated distribution in the non-smoking cohort would be deemed a potential candidate for differential methylation in association with smoking. It is commonly assumed that the distributional behaviour at DNAm sites can be characterised by a small number of behaviours, such as either `hypo- or hyper-methylated' (1). In existing approaches for EWAS, DNAm sites are often deemed to be differentially methylated if their mean DNA methylation levels differ. In this project, we aim at uncovering differentially methylated sites by exploring the distributional behaviour of DNA methylation at each DNAm site, similar in vein to Lock and Dunson (2).
The overall aim of this PhD is to develop a novel statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). Additional method development would depend on the candidate's research interests but could include development of novel methods to account for cell heterogeneity or novel methods for imputation of epigenetic features.
This project will equip the student with skills in the analysis of high-dimensional data, and development of scalable clustering techniques and algorithms, training in the analysis and interpretation of epigenetic data. The student will have the opportunity to spend a period circa 6 months, most likely at the start of Year 3, as a visiting student working in the School of Mathematics and Statistics in University College Dublin with Prof. Gormley. They will benefit from being housed with and exposed to the cohort and activities of PhD students there as part of the Science Foundation Ireland Centre for Research Training in Foundations of Data Science www.data-science.ie, @data_science_ie), and also to the PhD students in the national Insight Center for Data Analytics.
We propose to find a factorisation of methylation profiles and to develop a novel generative statistical model similarly to the successful application of principal component analysis as an estimation technique under high-dimensional factor models(3). We will adopt a non-parametric approach, whereby structural assumptions that are natural to the application DNA methylation data can be imposed. At the University of Bristol, we have access to a large dataset ARIES(4) with DNAm of 1000 children measured at 3 time points and their mothers at two time points offering the student an excellent platform for developing these methods.
1 Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9, 365, doi:10.1186/1471-2105-9-365 (2008).
2 Lock, E. F. & Dunson, D. B. Shared kernel Bayesian screening. Biometrika 102, 829-842, doi:10.1093/biomet/asv032 (2015).
3 Fan, J., Liao, Y. & Mincheva, M. Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 75, doi:10.1111/rssb.12016 (2013).
4 Relton, C. L. et al. Data Resource Profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). Int J Epidemiol 44, 1181-1190, doi:10.1093/ije/dyv072 (2015).
Dr Denize Atan (lead), Dr Theresa Redaniel, Dr Tim Jones, Senior Research Associate & former cancer analyst at Public Health England, Applied Research Collaboration-West Dr Beth Stuart, Medical Statistician, University of Southampton Dr Samiel Merriel, GP with expertise in early cancer diagnosis and prevention, University of Exeter
Brain tumours affect 8 per 100,000 people in the UK each year. Brain tumours often affect vision before causing any other symptoms. Unless diagnosed early, many people with brain tumours will die or suffer long-term disabilities, like permanent sight loss.
In 2016, an 8-year-old boy called Vincent Barker was in the news. During a routine sight test, his optometrist, Honey Rose, failed to detect optic nerve swelling at the back of his eyes - a sign indicating raised intracranial pressure. He died soon afterwards. As a result, Honey Rose was convicted for gross negligence manslaughter.
Since the widespread media coverage of the Rose/Barker case, optometrists have been referring more people to hospital over concerns they might have optic nerve swelling. Because of this, we think more patients with brain tumours are diagnosed earlier and more frequently by eye specialists than 5 years ago.
Our primary aims are to find out the number of people diagnosed with brain tumours every year between 2013 to 2018 in England and the proportion who were diagnosed by eye specialists before and after the Rose/Barker case in 2016.
Our secondary aims are to determine whether patients with brain tumours were diagnosed earlier by hospital eye specialists compared with other routes-to-diagnosis and whether they lived longer and had better treatment outcomes as a result.
Public Health England and the National Cancer Registry and Analysis Service (NCRAS) routinely collect data on everyone diagnosed with benign and malignant brain tumours in England. We have National Research Ethics Committee approval to access NCRAS data linked to Hospital Episode Statistics; and we have obtained the data on all new cases of benign and malignant brain tumours diagnosed between 2013 and 2018.
Trends in the data will be investigated in the 3 years before and after exposure to the widespread media coverage Rose/Barker case in 2016 by generalised linear regression techniques. We will determine the change in:
(i) Adjusted odds ratios for the number of brain tumours diagnosed via hospital eye services
(i) Time to diagnosis
(ii) WHO tumour grade at diagnosis
(iii) Cancer stage at diagnosis
(iv) Time between diagnosis and treatment
(v) Mortality
Age, sex, ethnicity, geographical location, deprivation index, and smoking history will be used as covariates in these analyses.
1. Poostchi A, et al. Spike in neuroimaging requests following the conviction of the optometrist Honey Rose. Eye 2018.
2. Elliss-Brookes L, et al. Routes to diagnosis for cancer. B J Cancer 2012.
3. Koo MM, et al. Presenting symptoms of cancer and stage at diagnosis. Lancet Oncol 2020.
Dr Denize Atan (lead), Dr Alyson Huntley, Lorna Duncan, Senior Research Associate and expert on mixed methods, Centre for Academic Primary Care Beth Stuart, Associate Professor in medical statistics, University of Southampton Matt Ridd, Professor of primary care and GP, Centre for Academic Primary Care Sam Merrilll, GP & expert on early cancer diagnosis, University of Exeter Mary-Ann Sherrat, Lead Optometrist, University of West of England Michael Bowen, Director of Research at the College of Optometrists Professor Jonathan Benger, Co-director of REACH (Research in Emergency Care, Avon Collaborative Hub) Paul Roy, Research, contracts, and innovation manager at Bristol, North Somerset, South Gloucershire Clinical Commissioning Group.
Papilloedema refers to swelling of the optic nerves caused by raised intracranial pressure. Papilloedema can be the first sign of life-threatening disease, e.g., brain tumours; but only 50% of people have symptoms.
Severe optic nerve swelling isn't difficult to recognise but it takes training to distinguish early papilloedema from conditions which mimic it (pseudopapilloedema). Specialist imaging tests can help, but GPs and opticians vary a lot in their ability to examine eyes, the imaging equipment they have available and their expertise in interpreting the imaging. There is also considerable variation between hospitals across England in how they handle referrals for papilloedema and pseudopapilloedema. Due to diagnostic uncertainty, opticians and GPs refer many more people to hospital than necessary. As a result, people who have brain tumours may have to wait longer for appointments, putting them at risk of death, sight loss and worse treatment outcomes.
The aim of this project is to determine how current practice, decision-making and areas of clinical need vary in primary and secondary care across England in the diagnosis and referral of patients with papilloedema versus pseudopapilloedema.
Health professionals (optometrists, ophthalmologists, neurologists, GPs, emergency care physicians) across England will be surveyed via their professional bodies (Royal Colleges, Clinical Research Networks) using questions based around a series of case vignettes that have been piloted in Bristol. Additionally, we shall submit Freedom of Interest (FOI) requests to CCGs across England with a set of questions based on likely areas of clinical need. These national surveys will be followed up with 30 face-to-face semi-structured interviews with health professionals, identified by purposive sampling of eye specialists and GPs (10 GPs, 10 optometrists, 10 ophthalmologists) based on geographic area of workplace. The results will be written up using narrative summary and thematic analysis with a view to publishing guidelines of current and best practice in the peer-reviewed literature. Later stages of the project will evaluate the impact of implementing best practice guidelines.
1. Grant. Brain tumour diagnosis and management. J neurol, neurosurg, psych 2004.
2. Rebolleda, et al. OCT to Differentiate Papilledema from Pseudopapilledema. Curr Neurol Neurosci Rep 2017.
3. Dixon-Woods, et al. Synthesising qualitative and quantitative evidence. J Health Serv Res Policy 2005.
Dr Emma Vincent (lead), Dr Caroline Bull, Professor Nicholas Timpson
Growing evidence suggests that cell extrinsic factors are key in modulating tumor progression. Metabolites are small molecules that act as sources of fuel and building blocks essential for cells and tissues when present at normal levels. Many causal risk factors for cancer (such as obesity or smoking) perturb metabolite levels, meaning cells of the body are exposed to an abnormal metabolic environment in at-risk individuals. It is possible that metabolites may be involved in the causal mechanisms linking risk factors with cancer development, acting to favour the growth and survival of cancer-initiating cells.
This project will improve our understanding of the causal metabolic drivers of cancer development.
Specific aims:
1. To identify genetic predictors of cancer susceptibility and risk factors causally related to cancer
2. To estimate the causal effects of cancer susceptibility and cancer risk factors on circulating metabolites
3. To triangulate evidence to build knowledge of the causal mechanisms linking circulating metabolites with genetic susceptibility to cancer, risk factors for cancer and cancer development using epidemiological and preclinical techniques.
We will construct polygenic risk scores for cancer susceptibility and risk factors for cancer (tissues/specific risk factors will be determined by the student's interests) in mothers and children of the ALSPAC cohort to determine associations with circulating metabolites. We will use Mendelian randomization to estimate causal relationships between genetic susceptibility and risk factors for cancer with circulating metabolites and with cancer risk using summary statistics from large cancer consortia. There will also be the opportunity to work across research disciplines and to conduct mechanistic follow-up analyses in the laboratory using cell culture techniques.
Brennan, P & Davey Smith, G. Identifying Novel Causes of Cancers to Enhance Cancer Prevention: New Strategies are Needed. JNCI. https://doi.org/10.1093/jnci/djab204.
Martinez-Reyes, I & Chandel, N.S. Cancer metabolism: looking forward. Nature Metabolism. https://doi.org/10.1038/s42255-021-00478-5
Prof Julian Higgins (lead), Prof Kate Tilling, Dr Alexandra McAleenan, Prof Marcus Munafò
Triangulation, in which multiple methods are strategically used to answer a single question, is a currently developing area. Lawlor, Tilling and Davey Smith (2016) explained how causal inferences can be strengthened by integrating results from several approaches with different key sources of potential bias. The statistical methods for combining the results from multiple sources of evidence within a triangulation framework are, however, underdeveloped. This PhD seeks to develop, illustrate and evaluate such methods.
The project seeks to develop and implement quantitative methods for triangulation of multiple lines of evidence addressing the same underlying epidemiological question.
Work is expected to focus on three key areas as follows.
1) At its simplest, triangulation involves comparison and combination of studies of the same exposure-outcome effect that use different designs or analytic methods. For example, randomized trials, Mendelian randomization studies and traditional multivariable regression analyses of observational evidence might all tackle a question relating to the same exposure-outcome effect. The studies may produce different effect estimates because they are (i) asking subtly different questions (e.g. in relation to the period or patterns of exposure), (ii) compromised by different biases and/or (iii) subject to chance. Triangulation combines these issues in a statistical model and assesses the extent to which the observed data fit together – an approach known as multiparameter evidence synthesis. Methods for producing these models, assessing coherence and drawing conclusions about causal effects of the exposure on the outcome will be developed. The project will primarily explore Bayesian methods, because they are flexible and allow incorporation of external information through prior distributions.
2) Another form of triangulation arises when some (or all) studies address only a component of the underlying question. For example, if the exposure-outcome effect occurs through an intermediate, then studies of the exposure-outcome effect might be triangulated with a combination of studies (i) of the effect of exposure on the intermediate and (ii) of the effect of the intermediate on the outcome. Methods will be developed to synthesise these three sets of studies, and account for true differences, biases and chance.
3) In addition to working on novel statistical methods, the student may explore other methodological questions. First, how should we define and identify studies suitable for a triangulation exercise? Automation tools may help here, such as MELODI (http://melodi.biocompute.org.uk), which we have developed to identify studies examining intermediates between exposure and outcome. Second, how should we evaluate the risk of bias in studies for which formal frameworks (such as RoB 2 and ROBINS-I; http://riskofbias.info) have not been developed? Third, what sources of information are available about biases, to inform prior distributions, and how can more information be generated?
Methods developed in these three areas will be illustrated through application to important causal questions in epidemiology.
Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016 Dec 1;45(6):1866-1886. doi: 10.1093/ije/dyw314.
Munafò MR, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018 Jan;553(7689):399-401. doi: 10.1038/d41586-018-01023-3.
Munafò MR, Higgins JPT, Davey Smith G. Triangulating Evidence through the Inclusion of Genetically Informed Designs. Cold Spring Harb Perspect Med. 2021 Aug 2;11(8):a040659. doi: 10.1101/cshperspect.a040659.
Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009 Jan;172(1):21-47. doi: 10.1111/j.1467-985X.2008.00547.x.
Ades AE, Welton NJ, Caldwell D, Price M, Goubar A, Lu G. Multiparameter evidence synthesis in epidemiology and medical decision-making. J Health Serv Res Policy. 2008 Oct;13 Suppl 3:12-22. doi: 10.1258/jhsrp.2008.008020.
Sarah Lewis (lead), Dr Kostas Tsilidis,
Diet and lifestyle are likely to play an important role in colorectal cancer risk; obesity, low levels of physical activity, red and process meat consumption and low intake of dietary fibre have all been shown to predict colorectal cancer risk. In addition, we have previously used Mendelian randomization (MR) analyses to identify physical activity (1) and iron, vitamin-B12 and selenium as potential risk factors for colorectal cancer risk (2). However, it is not clear which are the independent risk factors and which are confounded by other diet and lifestyle factors and the mechanisms via which these risk factors contribute to colorectal cancer are not well understood.
This project will; identify independent diet and lifestyle risk factors for colorectal cancer, integrate multi-omics data in a hypothesis free analysis to uncover novel risk factors and elucidate the biological mechanisms between modifiable risk factors and colorectal cancer.
This project will use many different MR methods, integrate omics datasets and perform mediation analyses to identify risk factors for colorectal cancer risk and elucidate the biological mechanisms involved in this disease.
Firstly the student will use a Bayesian framework (3), multivariable MR, co-localisation analyses and bi-directional MR methods incorporating instruments across many different potential risk factors in order to identify independent factors for colorectal cancer risk.
Once independent risk factors have been identified, the student will use two-step mendelian randomization to determine whether the risk factor-colorectal cancer link is mediated by any of the following four pathways; inflammation, growth hormones, sex hormones and insulin signalling.
Zuber V, Gill D, Ala-Korpela M, Langenberg C, Butterworth A, Bottolo L, Burgess S. High-throughput multivariable Mendelian randomization analysis prioritizes apolipoprotein B as key lipid risk factor for coronary artery disease. Int J Epidemiol. 2021 Jul 9;50(3):893-901. doi: 10.1093/ije/dyaa216. PMID: 33130851; PMCID: PMC8271202.
Tsilidis KK, Papadimitriou N, Dimou N et al. Genetically predicted circulating concentrations of micronutrients and risk of colorectal cancer among individuals of European descent: a Mendelian randomization study. Am J Clin Nutr. 2021 Jun 1;113(6):1490-1502. doi: 10.1093/ajcn/nqab003. Erratum in: Am J Clin Nutr. 2021 Jun 1;113(6):1715. PMID: 33740060; PMCID: PMC8168352.
Papadimitriou N, Dimou N, Tsilidis KK et al. Physical activity and risks of breast and colorectal cancer: a Mendelian randomisation analysis. Nat Commun. 2020 Jan 30;11(1):597. doi: 10.1038/s41467-020-14389-8. PMID: 32001714; PMCID: PMC6992637.
Dr Emma Vincent (lead), Dr James Yarmolinsky, Dr Javier Gonzalez (University of Bath)
Colorectal cancer (CRC) is the third most common cancer worldwide and constitutes around 10% of all cancer deaths. Overall incidence of CRC has gradually declined, partially attributable to societal changes towards healthier lifestyle choices, but, alarmingly, CRC cases have increased in individuals aged between 20-39 years old. These patients have poor 5-year survival rate (<60%) emphasising the urgency for a greater understanding of what causes CRC so that we might prevent it arising in the first place.
4 in 10 UK cancer cases are thought to be preventable through lifestyle changes and obesity is now an established risk factor for CRC development. Weight-loss by dietary intervention as a strategy to reduce CRC risk of is an active area of research. For example, it has been found that individuals who consume high amounts of simple sugars may have a greater risk of developing CRC, conversely, those consuming a ketogenic diet, with low carbohydrate intake, may have a reduced risk of developing this disease. Adipose tissue likely plays a role in promoting tumourigenesis but the exact mechanisms underlying this relationship are incompletely understood.
Research question: Can manipulating dietary sugar and carbohydrate intake reduce CRC risk by altering adipose tissue biology?
Aims:
1. To determine gene expression changes in adipose tissue in response to the dietary interventions.
By analysing samples and data collected as part of a randomized control trial (RCT).
2. To determine whether these gene expression changes influence colorectal cancer risk.
Using techniques in genetic epidemiology.
3. To explore how these gene expression changes impact CRC development.
Using cellular models of CRC.
Conducting randomized control trials (RCTs) of dietary interventions to determine the impact on cancer risk is challenging, given the long-time frame and large number of participants needed. To combat this, this PhD will use data and samples collected as part of a 12-week dietary intervention trial (manipulating sugar and carbohydrate intake) on healthy, cancer-free individuals. Biological changes in the participants adipose tissue samples will be measured and then whether these changes impact CRC risk will be investigated using a genetic epidemiological method called Mendelian randomization (MR). MR will allow us to use the biological changes we observe in the RCT as proxies to investigate whether the dietary intervention could impact CRC risk.
For dietary changes to impact cancer risk, we would expect to observe changes in gene expression in the body’s cells and tissues, indicating that the dietary change impacts cell biology. We will therefore measure gene expression using RNA-seq and use MR to predict whether changes impact risk of CRC. We can then investigate how these changes might impact CRC development using cellular models of CRC.
Brennan, P., & Davey Smith, G. Identifying Novel Causes of Cancers to Enhance Cancer Prevention: New Strategies are Needed. JNCI. 10.1093/jnci/djab204
Lauby-Secretan, B., et al. Body Fatness and Cancer — Viewpoint of the IARC Working Group. N Engl J Med. 10.1056/NEJMsr1606602
Davey Smith, G., & Shah, E. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? IJE. 10.1093/ije/dyg070
Prof Julian Higgins (lead), Prof Kate Tilling, Prof Marcus Munafò
Triangulation, in which multiple methods are strategically used to answer a single question, is a currently developing area. Lawlor, Tilling and Davey Smith (2016) explained how causal inferences can be strengthened by integrating results from several approaches with different key sources of potential bias. The statistical methods for combining the results from multiple sources of evidence within a triangulation framework are, however, underdeveloped. This PhD seeks to develop, illustrate and evaluate such methods.
The project seeks to develop and implement quantitative methods for triangulation of multiple lines of evidence addressing the same underlying epidemiological question
Work is expected to focus on three key areas as follows.
1) At its simplest, triangulation involves comparison and combination of studies of the same exposure-outcome effect that use different designs or analytic methods. For example, randomized trials, Mendelian randomization studies and traditional multivariable regression analyses of observational evidence might all tackle a question relating to the same exposure-outcome effect. The studies may produce different effect estimates because they are (i) asking subtly different questions (e.g. in relation to the period or patterns of exposure), (ii) compromised by different biases and/or (iii) subject to chance. Triangulation combines these issues in a statistical model and assesses the extent to which the observed data fit together – an approach known as multiparameter evidence synthesis. Methods for producing these models, assessing coherence and drawing conclusions about causal effects of the exposure on the outcome will be developed. The project will primarily explore Bayesian methods, because they are flexible and allow incorporation of external information through prior distributions.
2) Another form of triangulation arises when some (or all) studies address only a component of the underlying question. For example, if the exposure-outcome effect occurs through an intermediate, then studies of the exposure-outcome effect might be triangulated with a combination of studies (i) of the effect of exposure on the intermediate and (ii) of the effect of the intermediate on the outcome. Methods will be developed to synthesise these three sets of studies, and account for true differences, biases and chance.
3) In addition to working on novel statistical methods, the student may explore other methodological questions. First, how should we define and identify studies suitable for a triangulation exercise? Automation tools may help here, such as MELODI (http://melodi.biocompute.org.uk), which we have developed to identify studies examining intermediates between exposure and outcome. Second, how should we evaluate the risk of bias in studies for which formal frameworks (such as RoB 2 and ROBINS-I; http://riskofbias.info) have not been developed? Third, what sources of information are available about biases, to inform prior distributions, and how can more information be generated?
Methods developed in these three areas will be illustrated through application to important causal questions in epidemiology.
Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016 Dec 1;45(6):1866-1886. doi: 10.1093/ije/dyw314.
Munafò MR, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018 Jan;553(7689):399-401. doi: 10.1038/d41586-018-01023-3.
Munafò MR, Higgins JPT, Davey Smith G. Triangulating Evidence through the Inclusion of Genetically Informed Designs. Cold Spring Harb Perspect Med. 2021 Aug 2;11(8):a040659. doi: 10.1101/cshperspect.a040659.
Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009 Jan;172(1):21-47. doi: 10.1111/j.1467-985X.2008.00547.x.
Ades AE, Welton NJ, Caldwell D, Price M, Goubar A, Lu G. Multiparameter evidence synthesis in epidemiology and medical decision-making. J Health Serv Res Policy. 2008 Oct;13 Suppl 3:12-22. doi: 10.1258/jhsrp.2008.008020.
Prof Sarah Lewis (lead), Prof Richard Martin, Dr Richard Pulsford (University of Exeter) Prof Russ Jago Dr Freddie Bray (International Agency for Research on Cancer)
Around 40% of all cancers are thought to be avoidable by modification of lifestyle factors. Obesity has been found to be a risk factor for several cancers and was estimated to have caused around 3.6% of all new cancers which occurred in 2012. There is now an emerging evidence base which shows that low levels of physical activity can increase the risk of cancer. The World Cancer Research Fund (WCRF), as part of their continuous update project, have concluded that there is strong evidence that high levels of physical activity decreases the risk of cancers of the breast, endometrium and colon and rectum. In addition, we have recently shown using common genetic variation (a technique called Mendelian randomization) that physical activity protects against prostate, colorectal and breast cancer risk. In our analyses we found larger protective effects of physical activity on cancers of the breast, prostate, colon and rectum than were estimated by observational studies. It is possible that cancers at other sites are similarly influenced by physical activity but they have not yet been investigated in this way.
The aims of the project are to:
1. Identify the patterns of physical activity which are most likely to be causing cancer using wrist-worn accelerometer data and cancer outcomes in UK Biobank.
2. Test whether low levels of physical activity is a causal risk factor for cancer at several sites using Mendelian randomization and data from large cancer consortia
3. Use the best publicly available data to estimate the global prevalence of low levels of physical activity by country/region/ethnicity.
4. Estimate the global burden of cancer attributable to low levels of physical activity using cancer surveillance data compiled by IARC broken down by country/region and cancer type.
The student will be access wrist worn accelerometry data from UKBiobank or other similar population based cohorts. The student will characterised in a variety of different ways in order to investigate different features or patterns in behaviour and will use epidemiological techniques to understand the relationship between patterns of activity and cancer.
The student will be trained on the theory and application of Mendelian radomization and will use this to determine which cancers are caused by low levels of physical activity (and other patterns of physical activity).
The student will receive training in the interpretation of cancer registry data and will use this data alongside global data on physical activity to determine the proportion of cancers which can be attributed to insufficient physical activity worldwide.
https://bjsm.bmj.com/content/52/13/826.long
https://www.wcrf.org/dietandcancer/exposures/physical-activity
https://www.nature.com/articles/s41467-020-14389-8
Josine Min (lead), Gibran Hemani, Johann Hawe (Illumina)
Genome wide associations studies (GWASs) have discovered many genetic associations with a large range of human traits, but the functional consequences of GWAS signals often remain elusive, as most GWAS signals reside in non-coding genomic regions. However, GWAS signals are enriched in DNA regulatory elements and cell type specific annotations, and thus it is likely that GWAS signals confer their effects through modulating gene regulatory mechanisms.
Genetic factors for molecular traits (DNA methylation, gene expression, protein levels) are being discovered at an astonishing rate. A major hope for these genetic factors is that they can be used to identify causal mechanism of complex traits.[1] Fascinatingly, the dimensionality of molecular phenotyping is bound to surpass the density of human genetic variation, meaning that genetic pleiotropy (where one variant influences multiple phenotypes) is a necessary feature amongst molecular phenotypes. This has critical downstream implications for being able to use genetics to make valid causal inference of putative molecular targets on disease incidence and progression.
This project will build a resource for storing and querying harmonized molecular QTL data in a computational efficient manner, and then use that resource to build pleiotropy maps of human molecular phenotypes. These maps will subsequently be used in evolutionary modelling and in collaboration with Illumina using machine learning and artificial intelligence approaches to understand the basis of molecular pleiotropy. This will include a research visit to Illumina AI lab in Germany.
1. Develop a computational framework for storing and querying molecular QTLs that will integrate with the OpenGWAS project
2. Generate pleiotropy maps using fine mapping and colocalization
3. Use evolutionary models to understand the impact of pleiotropy on natural selection processes
4. Use deep learning to predict disease mechanisms and disease progression from molecular pleiotropy maps
Currently summary statistics are stored for each GWA dataset separately. However this is not sustainable for QTL summary statistics with millions of molecular features. Therefore a new framework will be developed to store complete molecular QTL statistics for each dataset. Fine mapping and colocalization analysis will be used to integrate methylation QTL statistics from the Genetics of DNA Methylation Consortium, expression QTL statistics from eQTLGen and protein QTL statistics from SCALLOP and ALSPAC. This will result in maps of colocalized molecular traits. We will investigate biological models of pleiotropy, for example by using evolutionary models and gene-environmental interactions. We will use deep learning to identify molecular pleiotropy maps that correspond to distinct phenotypic patient subgroups.
1. Neumeyer S, Hemani G, Zeggini E. Strengthening Causal Inference for Complex Disease Using Molecular Quantitative Trait Loci. Trends Mol Med. 2020;26(2):232-41.
Dr Anna Murray and Prof Tim Frayling (lead), Dr. Kate Ruth, Dr Rebecca Richmond
The relationship between menopausal status, fat distribution, hormonal status and cancer is complicated because they are all correlated with each other – for example body mass index accounts for 20% of the variance in levels of sex hormone binding globulin (SHBG) in women (compared to 9% in men) and is a strong marker of healthy fat distribution (1). Genetic studies have identified multiple genetic risk factotrs linked to menopause, and hormonal status (1-7). New resources such as UK Biobank combined with Mendelian randomisation (MR) methods offer an important opportunity to study the separate effects of different aspects of menopausal changes on cancer risk and progression. For example, a recent study used variants associated with BMI as an exposure and breast cancer before and after menopause as an outcome – genetic variants associated with higher BMI were associated with lower risk of breast cancer in both post menopausal as well as pre menopausal women (8). This result in post menopausal women is the opposite to the epidemiological association and needs further investigation but may reflect important differences between lifelong exposure to higher BMI, insulin resistance and hormonal status, compared to the later life change caused by menopause.
1. to assemble the largest and most detailed datasets relevant to cancer, menopause and sex hormone levels. These datasets will include new data from large biobanks, and electronic medical record linked data.
2. to identify and characterise genetic variants associated with specific aspects of fat distribution and hormonal exposure. This aim will go far beyond simple “genome wide association studies” and aim to partition variants into those associated with specific mechanisms.
3. to apply a range of Mendelian randomisation and mediation methods to test causal pathways from menopause to cancer risk and the mediating role of fat distribution and hormonal changes.
This project will proceed in 2 stages. In stage 1, you will identify and characterise genetic variants associated with specific aspects of i) age at natural menopausal, sex hormone levels and fat distribution. For natural age at menopause, you will work closely with the Reprogen consortium that includes >100,000 women with natural menopause age data. The recent release of sex hormone measures (SHBG, testosterone, oestradiol) in 437,000 UK Biobank individuals (55% women) will facilitate the identification of multiple new loci associated with hormone levels and their characterisation into those having direct effects on sex hormones via adiposity. A preliminary analysis of this data shows >500 loci associated with sex hormone levels but that ~1/3rd are highly pleiotropic. For differences in fat distribution before and after menopause, we will use the abdominal MRI imaging data from ~55,000 women in the UK Biobank, and compare subcutaneous and visceral and liver fat measures between the subset (~10%) of women scanned pre menopause to age and BMI matched women scanned post menopause. In stage 2, you will use these variants to build genetic instruments for these separate risk factors and use them in MR tests as genetic proxies with common cancers as outcomes. You will study female specific cancers such as Breast, endometrial and ovarian.
1: Ruth KS, et al. Using human genetics to understand the disease impacts of testosterone in men and women. Nat Med. 2020 Feb;26(2):252-258. doi: 10.1038/s41591-020-0751-5. Epub 2020 Feb 10. PubMed PMID: 32042192; PubMed Central PMCID: PMC7025895.
2: Ruth KS, et al. Genome-wide association study with 1000 genomes imputation
identifies signals for nine sex hormone-related phenotypes. Eur J Hum Genet. 2016 Feb;24(2):284-90. doi: 10.1038/ejhg.2015.102. Epub 2015 May 27. PubMed PMID: 26014426; PubMed Central PMCID: PMC4564946.
3: Day FR, et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat Genet. 2015 Nov;47(11):1294-1303. doi: 10.1038/ng.3412. Epub 2015 Sep 28. PubMed PMID:26414677; PubMed Central PMCID: PMC4661791.
4: Perry JR, Murray A, Day FR, Ong KK. Molecular insights into the aetiology of female reproductive ageing. Nat Rev Endocrinol. 2015 Dec;11(12):725-34. doi:10.1038/nrendo.2015.167. Epub 2015 Oct 13. Review. PubMed PMID: 26460341; PubMed Central PMCID: PMC6309261.
5: Perry JR, et al. DNA mismatch repair gene MSH6 implicated in determining age at natural menopause. Hum Mol Genet. 2014 May 1;23(9):2490-7. doi: 10.1093/hmg/ddt620. Epub 2013 Dec 19. PubMed PMID: 24357391; PubMed Central PMCID: PMC3976329.
6: Stolk L et al. Meta-analyses identify 13 loci associated with age at menopause and highlight DNA repair and immune pathways. Nat Genet. 2012 Jan 22;44(3):260-8. doi: 10.1038/ng.1051. PubMed PMID: 22267201; PubMed Central PMCID: PMC3288642.
7: Janssen I et al. Testosterone and visceral fat in midlife women: the Study of Women's Health Across the Nation (SWAN) fat patterning study. Obesity (Silver Spring). 2010 Mar;18(3):604-10. doi: 10.1038/oby.2009.251. Epub 2009 Aug 20. PubMed PMID: 19696765; PubMed Central PMCID: PMC2866448.
8: Guo Y, et al. Genetically Predicted Body Mass Index and Breast Cancer Risk: Mendelian Randomization Analyses of Data from 145,000 Women of European Descent. PLoS Med. 2016 Aug 23;13(8):e1002105. doi: 10.1371/journal.pmed.1002105. eCollection 2016 Aug. PubMed PMID: 27551723; PubMed Central PMCID: PMC4995025.
Dr Kostas Tsilidis (lead), Dr Sarah Lewis, Dr Mattias Johansson
There is ample biological evidence that chronic inflammation triggers the development and progression of many cancers. Moreover, epidemiological studies and meta-analysis thereof have shown that higher circulating concentrations of C-reactive protein are associated with an increased risk of several cancers including colorectal, lung, breast and ovarian cancer, and prolonged use of non-steroid anti-inflammatory (NSAID) drugs reduces the risk of several solid tumours. However, observational studies linking circulating cytokine concentrations to risk of several cancers are sparse, inconsistent and may be afflicted by biases. The evidence for a potential causal role of chronic inflammation on the risk of cancer development can be improved using Mendelian Randomization (MR).
To explore whether genetically determined circulating concentrations of inflammatory cytokines and related traits (e.g. allergies, periodontal disease and autoimmune disorders, etc.) are associated with risk of cancer at most anatomic sites using the MR approach.
Summary data for the association between the SNPs and the cytokine concentrations and related traits will be retrieved from published GWAS. After a preliminary literature search, we identified SNPs in published GWAS studies that are genome-wide significantly associated with 54 different families of cytokines, which explain variant proportions of the variance in the different cytokines that ranges from approximately 1% to 30%. Summary data for the association between the selected SNPs and risk of different cancers will be retrieved from relevant genetic consortia.
We will create a multi-SNP score based on the chosen SNPs for each cytokine and related trait, and we will implement two MR methods using the available summary data: i) an inverse-variance weighted (IVW) average of SNP-specific associations, and ii) a likelihood-based method. Subgroup analyses will be conducted by anatomical and histological subtypes for each cancer and according to certain lifestyle variables (e.g. sex, smoking status, etc.), where available. We will select SNPs based on p-value threshold of 5e-8 and perform linkage disequilibrium clumping (r2<0.01). Sensitivity analyses will be performed after relaxing the p-value threshold to 1e-6 and selecting as instruments only cis-acting SNPs, defined as within ±500kb of the relevant intermediate phenotype genes. To assess potential violation of the second MR assumption due to directional horizontal pleiotropic SNP effects, we will employ the MR-Egger regression method, the weighted median and weighted mode approaches, MR-PRESSO and multivariable MR. Finally, given the large number of cytokines, we will consider applying multiplicity correction methods.
Mattias Johansson (lead), Karl Smith Byrne , Richard Martin, Linda Kachuri
Cancer risk can broadly be described as the combination of inherited genetic risk and the risk accumulated over a lifetime from lifestyle and environmental factors. While large GWAS consortia have well-characterized the genetic component of many of the most highly incident cancer sites, it remains an open question to what extent a combination of genetic risk factors and lifestyle/environmental factors can be used to stratify risk for cancer overall and for individual cancer sites, at a population level. We believe that, with the availability of genetic data in UK Biobank, novel methods for genetic risk prediction (LDPred, JAM, LassoSum, for example), and rich questionnaire data, we can now begin to answer how we can best use the available information to predict risk for cancer. Nonetheless, we likely expect the utility of genetics and questionnaire data for the prediction of disease to be heterogeneous by cancer site. A central question will be about how best to integrate these risk models for multiple cancer sites into a risk prediction (5-year risk, for example) for cancer overall for a person at a given age.
The overarching aim of this project is to construct and validate a comprehensive model for the prediction of cancer risk overall given lifestyle and genetic predisposition.
Preliminary analyses have established an exhaustive catalog of independent SNPs associated with each of the 20 top cancers, as well as a catalog of plausibly causal, modifiable lifestyle-related risk factors. Using this information, we have subsequently established risk models for each cancer individually using flexible parametric survival models. We have subsequently established a risk model for each cancer site separately by including a pre-defined polygenetic risk score based on the latest relevant major GWAS study, along with modifiable risk factors as measured in UK Biobank. Initial results have provided well calibrated models, with vast differences between cancer sites in both overall discriminatory performance and in lifestyle vs genetics importance. The expansion of this project will involve evaluating novel methods of genetic risk prediction for cancer that incorporate a larger number of SNPs that may be in weak linkage disequilibrium but have marginal independent predictive information. It is likely that the best method will differ by cancer site perhaps as a function of the number of incident cancer cases in the original GWAS. It is also likely that the incorporation of multiple cancer risk prediction models for the prediction of cancer overall may require evaluating more complex analyses, such as recurrent neural networks (LSTM) or attention-based time-aware models.
https://www.biorxiv.org/content/10.1101/2020.01.28.922088v1
https://pypi.org/project/LDpred/
https://github.com/tshmak/lassosum
https://www.nature.com/articles/s41746-018-0029-1/
Dr Paul Brennan (lead), Dr Siddhartha Kar , Dr Tom Gaunt
Somatic mutational signatures represent a physiological readout of the biological history of a cancer. They can reflect previous exposures that led to the development of the tumour as well as endogenous phenomena such as genetic defects in actionable pathways in proofreading and DNA damage repair mechanisms. Germline pathogenic mutations in the above mentioned genes are also often linked to cancer predisposition syndromes.
Based on analyses of over 4000 whole genome sequences, some mutational signatures have already been identified and attributed to germline mutations in predisposition syndromes including hereditary breast and ovarian cancer (COSMIC signature (CS) 3) and hereditary colorectal cancers (CSs 6, 30, 10, 14, & 36). Nevertheless, identified signatures with the associated predisposing genes are unable to explain the majority of hereditary cancer cases.
Hereditary gastrointestinal (GI) cancer syndrome is comprised of phenotypically diverse disorders including polyposis/non-polyposis colorectal cancer, pancreatic, & gastric cancer syndromes with particularly broad spectrum of tumours and yet undiscovered genetic origins. Here, by characterizing the mutational signatures of upto 2000 of these cancer, we aim to identify new predisposition genes and their link with mutation signatures, refine high-risk groups, and outline preventive risk factors.
1) To investigate the association between germline variants and mutational signatures among 1000 colorectal cancers and 500 pancreatic cancers within the Mutograph project
2) To classify patients into hereditary and sporadic categories using valid clinical guidelines (i.e. NCCN) and further compare clinicopathologic features and epidemiologic risk factors (such as BMI, smoking, meat consumption, gut microbiome) with mutational signatures
3) To investigate how identification of signatures can impact on clinical practice and improve clinical management of hereditary cancers in the future
Whole genome sequencing data of ~ 1500 colorectal and pancreatic cancer cases will become available from the CRUK Grand Challenge Mutograph project, including mutational signatures analysis. Additional information includes germline sequence data, clinical outcome and information on a broad range of risk factors. Using this extensive and unique dataset, the association between the identified germline susceptibility variants and the pattern and the intensity of the mutational signatures will be investigated, including the correlations with clinicopathologic and putative risk factors such as family history, BMI, diabetes and meat consumption.
Alexandrov, Ludmil B., et al. "The Repertoire of Mutational Signatures in Human Cancer." Nature 578 7793 (2020): 94-101.
Schubert, Stephanie A, et al. "The Missing Heritability of Familial Colorectal Cancer." Mutagenesis (2019).
Van Hoeck, Arne, et al. "Portrait of a Cancer: Mutational Signature Analyses for Cancer Diagnostics." BMC cancer 19 1 (2019): 457-57.