Bio-text mining makes advances

'We're moving from an economy of scarcity in data to an economy of abundance that's going to change the face of healthcare' - Mike Olson. In line with these predictions, EU researchers made significant headway in obtaining desired contextual information from massive chunks of biomedical data.

Digital Economy

Pubmed alone has information on more than 21 million scientific publications with over 2 000 new entries being added daily. The BIOLITCONTEXTMINING (Contextual text mining from the biomedical scientific literature) project designed methods based on natural language processing and machine learning to enable scientists to effectively extract and utilise relevant information. Project researchers advanced the state-of-the-art in bio-text mining with new methods for relation extraction, local and non-local context information extraction and knowledge discovery. For instance, their Interaction Network Ontology (INO) collects and classifies over 800 interaction keywords and can also cover complex interaction types. INO-based literature-mining helps to identify and characterise the interactions among host and Brucella genes. Through a technique for relation and local context information extraction, they can now identify the relations among brain regions. In another key development, researchers developed methods to identify important non-local context such as the experimental methods used to detect protein-protein interactions from full text articles. To understand the bacterial interaction mechanisms at the molecular level, knowing their natural environment location is vital. Astoundingly, no comprehensive database exists that carries this information despite the abundance of literature on bacteria ecology. Researchers developed ontology-centred methods to obtain bacteria context information such as their habitat. For access to contextual biomedical information, project members contributed to the development of two web-based systems – IGNET and PHISTO. Along with a knowledge discovery approach that was integrated with IGNET, they successfully identified fever and vaccine associated gene interaction networks in a case study. Significant progress was also made with regard to methodologies for analysis of gene-gene interaction and drug-target interaction prediction. The novel BIOLITCONTEXTMINING text mining tools will help advance several biomedical areas including experimental biology, bioinformatics and systems biology. Project outcomes have led to publications in eight peer-reviewed journals as well as six peer-reviewed conference and workshop papers with some journal papers currently under review.