Integration for ultimate information retrieval
Unravelling and sequencing the genome of an organism is a feat in itself. In order to fully utilise the gene sequences however, this information must be linked to the related transcriptional vehicles and the proteins encoded. Linkage of this information to genetic diseases, their biomolecular basis and molecular interactions then empowers pharmaceutical development and therapy. Aware of the potential of such levels of integration, the TEMBLOR project resulted in enhancing European resources in these intrinsically related fields. New services were developed relating to protein-protein interactions, macromolecular structures, microarray data and integrative queries. Among these, and central to the philosophy of integrated data, Integr8, the web portal enabled easy access to integrated information regarding deciphered genomes and their corresponding proteomes. Focussing on the sequence data of a gene the user is able to see genomic, transcriptional and protein structures inter-linked. The Wellcome Trust (Cambridgeshire, UK), as part of the Gene ontology consortium (GOC) since 2001, was responsible for the annotation to the human proteome. Subsequently, from 2006, it continued its genomic research and coordination activities as part of the European project GOA. Annotations were then extended to incorporate a range of disease-related proteomes. The UniProtKB, the UniProt knowledge base, encompasses some 100,000 species and is the world's largest collection of information on protein families. It incorporates information on protein from humans through to plants, and continuing down the evolutionary ladder to viruses. It also includes the most popular animal models of Drosophila, Xenopus and the zebra fish. Continual update of the datasets by GOA has, and will ensure that these supply a comprehensive source of reference annotations for the UniProt database. These annotations have resulted in worldwide collaborations as well as enhancing European visibility in the field of genomics and proteomics.