Big Link Data benchmarking gains ground in industry

Making ‘Big Linked Data’ a bankable solution for industry requires appropriate benchmarking tools to ensure that the developed solutions meet use cases’ requirements. Such tools are now available thanks to work conducted under the HOBBIT project.

Digital Economy

Ever heard of Linked Data? If not, you probably should have or will have soon enough. Just as Big Data is an evolution of data mining, Linked Data is an evolution of the Semantic Web which is itself the cornerstone of the Web 3.0 – an Internet where all information is categorised in such a way that computers and humans are made equal in their capacity to understand it. In a nutshell, Linked Data consists in using the web to connect related data that previously wasn’t. Industry already uses Linked Data, but its integration with Big Data has so far been hindered by the cost and difficulty of using the latter in a value chain. ‘Big Linked Data’ is facing obstacles related to the lack of standardised implementations of performance indicators – making it difficult to decide which tool to use and when to use it – and the fact that some of the dimensions of Big Data (velocity, volume, variety, veracity, value) are poorly supported by existing tools. “For example, managing billions of RDF triples (ed. note: a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions, such ‘John Doe loves CORDIS’) is still a major problem, volume-wise,” explains Prof. Dr Axel Ngonga of Paderborn University and the Institute for Applied Informatics in Leipzig. “Besides, the different streaming semantics and the lack of scalability of existing solutions make semantic stream processing at scale rather challenging (velocity issue). Finally, current learning approaches for structured data often don’t scale to large knowledge bases, making the detection of insights difficult (value).” Prof. Dr Ngonga has been leading a nine-strong consortium under the HOBBIT (Holistic Benchmarking of Big Linked Data) project to address these problems. Focusing on Industry 4.0 geo-spatial data management, smart cities and IT management, the team carried out surveys with over 100 participants before and during the project to determine key areas for benchmarking linked data. “Our surveys suggest that the benchmark families we created address some of the key domains of interest for European companies and researchers,” he explains. HOBBIT created a total of five benchmarking families to evaluate current software: knowledge extraction, storage, versioning, linking, and machine learning and question answering. On storage, they found that some of the solutions that performed best actually did so because the results they returned were partially incomplete. This alone proves that HOBBIT’s benchmarking covers previously unconsidered aspects and that there is a need for benchmarks all around Linked Data. Other findings include the fact that easily distributable solutions for knowledge extraction are still needed; that versioning is poorly supported and requires a standard; that open question-answering platforms still perform poorly in the wild; and that machine learning algorithms specific to Linked Data don’t scale too well. In this context, HOBBIT provides the first open, scalable and FAIR (findable, achievable, interoperable and retrievable results) benchmarking for Linked Data: “The HOBBIT platform is the first generic scalable benchmark for Big Linked Data. Its most innovative aspects include: distributed benchmarking of distributed systems; its portable nature for benchmarking both locally and in distributed environments; a one-command installation both locally and on Amazon Web Services; the reuse of standards for maximal interoperability and flexibility; and clearly defined interfaces for easy adaption to other data types and use cases,” says Dr Ngonga. The platform has been well received by industry, with ca. 40 clones being created each month and some industrial partners willing to take benchmarking services internally to improve the quality of their tools. The HOBBIT project will only end in November, as a second round of benchmarks is currently being run. The association created under the project will then take over, serving as a hub for benchmarking in Europe, supporting the further development of the HOBBIT platform and similar benchmarking frameworks, and providing benchmarking services to European stakeholders.