Big data - research
Specific Challenge: The activities supported within LEIT under this topic contribute to the Big Data challenge by addressing the fundamental research problems related to the scalability and responsiveness of analytics capabilities (such as privacy-aware machine learning, language understanding, data mining and visualization). Special focus is on industry-validated, user-defined challenges like predictions, and rigorous processes for monitoring and measurement.
Scope:
a. Research & Innovation Actions: proposals are expected to cover one or both of the themes identified below.
Collaborative projects to develop novel data structures, algorithms, methodology, software architectures, optimisation methodologies and language understanding technologies for carrying out data analytics, data quality assessment and improvement, prediction and visualization tasks at extremely large scale and with diverse structured and unstructured data. Of specific interest is the real time cross-stream analysis of very large numbers of diverse, and, where appropriate, multilingual, multimodal data streams. The availability for testing and validation purposes of extremely large and realistically complex European data sets and/or streams is a strict requirement for participation as is the availability of appropriate populations of experimental subjects for human factors testing in the domain of usability and effectiveness of visualizations. Explicit experimental protocols and analyses of statistical power are required in the description of usability validation experiments for the systems proposed. Proposals are expected, where appropriate, to make best possible use of large volumes of diverse open data from the European Union Open Data portal[1] and/or other European open data sources, including data coming from EU initiatives like Copernicus and Galileo.
Collaborative projects to define relevant benchmarks in domains of industrial relevance, assemble the data resources and infrastructure necessary for administering and validating the benchmarks and organise evaluation campaigns with a commitment to producing public reports on the performance of participants against the defined benchmarks. Since the goal is to create big data analysis and prediction benchmarking environments of sufficient general usefulness to be able to become self-sustaining after the end of funding, proposals will have to provide detailed and convincing exit strategies.
b. Support actions to define challenges and prize schemes for verifiable performance in tasks requiring extremely large scale prediction and deep analysis. Compact consortia are required to organise and run well-publicised fast turn-around prediction competitions based on European datasets of a significant size. Proposals in this category are expected to be short in duration and are not required to provide sustainability strategies past the end of the project.
Expected impact:
Ability to track publicly and quantitatively progress in the performance and optimization of very large scale data analytics technologies in a European ecosystem consisting of hundreds of companies; the ability to track this progress is crucial for industrial planning and strategy development.
Advanced real-time and predictive data analytics technologies thoroughly validated by means of rigorous experiments testing their scalability, accuracy and feasibility and ready to be turned over to thousands of innovators and large scale system developers.
Demonstrated ability of developed technologies to keep abreast of growth in data volumes and variety by validation experiments.
Demonstration of the technological and value-generation potential of the European Open Data documenting improvements in the market position and job creations of hundreds of European data intensive companies.
Types of action:
a. Research & Innovation Actions – A mix of proposals requesting Small and Large contributions are expected
b. Coordination and Support Actions