Project description
Scalable data analytics
LeanBigData aims at addressing three open challenges in big data analytics: 1) The cost, in terms of resources, of scaling big data analytics for streaming and static data sources; 2) The lack of integration of existing big data management technologies and their high response time; 3) The insufficient end-user support leading to extremely lengthy big data analysis cycles. LeanBigData will address these challenges by:Architecting and developing three resource-efficient Big Data management systems typically involved in Big Data processing: a novel transactional NoSQL key-value data store, a distributed complex event processing (CEP) system, and a distributed SQL query engine. We will achieve at least one order of magnitude in efficiency by removing overheads at all levels of the big-data analytics stack and we will take into account technology trends in multicore technologies and non-volatile memories. Providing an integrated big data platform with these three main technologies used for big data, NoSQL, SQL, and Streaming/CEP that will improve response time for unified analytics over multiple sources and large amounts of data avoiding the inefficiencies and delays introduced by existing extract-transfer-load approaches. To achieve this we will use fine-grain intra-query and intra-operator parallelism that will lead to sub-second response times.Supporting an end-to-end big data analytics solution removing the four main sources of delays in data analysis cycles by using: 1) automated discovery of anomalies and root cause analysis; 2) incremental visualization of long analytical queries; 3) drag-and-drop declarative composition of visualizations; and 4) efficient manipulation of visualizations through hand gestures over 3D/holographic views.Finally LeanBigData will demonstrate these results in a cluster with 1,000 cores in four real industrial use cases with real data, paving the way for deployment in the context of realistic business processes.
Fields of science
- natural sciencescomputer and information sciencesdata sciencebig data
- natural sciencescomputer and information sciencesdatabasesnon-relational databases
- natural sciencescomputer and information sciencesdata sciencedata processing
- natural sciencescomputer and information sciencesdatabasesrelational databases
Call for proposal
FP7-ICT-2013-11
See other projects for this call
Funding Scheme
CP - Collaborative project (generic)Coordinator
28040 Madrid
Spain