Periodic Reporting for period 2 - FindingPheno (Unified computational solutions to disentangle biological interactions in multi-omics data)
Reporting period: 2022-09-01 to 2024-02-29
The emergence and widespread adoption of high-throughput sequencing technologies have contributed to our understanding of biology, revealing the genetics underlying complex phenotypic traits in animals and plants such as disease resistance and growth efficiency. This omics revolution has led to an explosion in the amount and types of biological data we can now generate, to help us answer questions on the genetic architecture of phenotypes in the fields of food production and health science. It has also introduced the era of big data in the fields of biological and medical sciences and specifically, genomics, bringing about new technical and computational challenges for data analysis and interpretation. The number of different genomic data types that are now being regularly collected is increasing rapidly as large consortia collect and share genomes, transcriptomes, epigenomes, proteomes and microbiomes. This ability to generate vast amounts of different types of omics data has created the need for new, better and more sophisticated methods to extract the most information from the different omics data.
It is our conviction that adding microbiome data to the host omics data, along with multi-omics methods for analysis the host-microbiome data, is essential to explain biological mechanisms. Thus, there are two major research questions form the basis of the work in FindingPheno -
• How do the host and the microbiome interact to affect the biological processes of the host, and modulate host phenotype?
• How do we integrate the omics data from the host and the omics data from the microbiome to best understand the biology of the host-microbiome system?
In FindingPheno, we aim to overcome the challenges by developing a robust statistical solution allowing an intelligent integration of host and microbiome data in a single framework (see figure). The EU was among the first to recognize the necessity of viewing life through the integrated hologenomic. Under FP7 and H2020, the EU has invested more than €1.4 billion in the last decade and more than €100 million in 2018 towards microbiome research and related applications. Building upon this growth in host-microbiome interaction studies, FindingPheno will fill the niche for new methods to best exploit the data being generated from these efforts. FindingPheno’s solutions will lead to more efficient translation of biological research from lab to industry, thus boosting the bioeconomy, which is projected to grow at 7% from 2013-30, thus making the research done in FindingPheno important for maximizing the efficiency and sustainability of food production systems.
1. We have performed literature reviews to understand the types of machine learning and statistical methods used in multi-omics and time series multi-omics data analyses, giving us a platform on which to build our new methods. Further, we have been exploring the use of machine learning methods and structural models to integrate data from multi-omics datasets. In addition, machine learning methods are being developed for time series data, and in particular, we have initialized a R package for such methods, which integrate data type standards to ease the application of these methods. Alongside time series methods, mechanistic models have been developed to understand the interactions in the microbiome community. Finally, using genomic and phenotypic data from HoloFood chickens, we are developing methods to incorporate the evolutionary history of the genetic variation to better identify genotype-phenotype associations.
2. A stakeholder synergy meeting to map our stakeholders was held, including other EU projects, industrial stakeholders, academic researchers, and policy bodies. We organized a webinar, in conjunction with other EU projects and a center of excellence, with speakers from around the world sharing their insights into the technical challenges and the advantages of multi-omics approaches. Several partners have offered courses and workshops across the world to train the next generation of researchers in methods for multi-omics data analyses. The training material from these workshops and courses have been made freely available. We circulate a monthly newsletter updating the partners on the progress of the project. To engage a wider audience on our project, we have produced 2 videos highlighting the work planned in the project, with one video specifically targeted at high school students. Finally, as part of our outreach efforts, FindingPheno has establlished a persence on 3 SoMe platforms, viz., Twitter, LinkedIn and YouTube. Twitter and LinkedIn are the primary channels for outreach to the wider community.
3. We are in the early stages of this task, and one of the primary avenues for exploitation of results until now, has been the use of the MGnify pipeline and a project specific landing page for FindingPheno. In the project landing page, several publicly available datasets from tomatoes, honey bees and soyabean have been processed, and are now being used to both refine and validate our models. This platform will also be used for other datasets, once they have been incorporated into this pipeline, will be used to generate host-microbiome interaction results from the three focus organisms, chicken, salmon and maize.