Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary
Content archived on 2024-06-18

Ultra-Scalable and Ultra-Efficient Integrated and Visual Big Data Analytics

Article Category

Article available in the following languages:

Combining operational and analytical databases in a single platform

EU-funded project LEANBIGDATA has produced a real-time big data platform able to halve the cost of data analytics and enable it in real time.

A Spanish-led project aims to enable companies to do far more with their Big Data and use far fewer resources to do so. LEANBIGDATA has developed a platform for managing Big Data which is ultra efficient and highly scalable. Big businesses and organisations process ever-increasing amounts of data. But the techniques they use to do this are often inefficient and consume large amounts of resources. Organisations typically use two databases, one for operational data and a second for data warehousing. In order to analyse the data, it must be copied from the first to the second and, seeing as data quickly becomes stale, this must be done regularly — usually every day. Such a process, known as extraction-transform-load or ETL, is expensive to set up and maintain. ‘This accounts for 75 % to 80 % of the cost of data analytics,’ says Ricardo Jiménez, LEANBIGDATA’s technical coordinator and CEO & co-founder of LeanXcale, a spin-off set up to commercialise the project’s core results. What is more, big data analysis tends to run in batch mode rather than real time, so users cannot react quickly to events. Two for the price of one The LEANBIGDATA team has designed an architectural solution which can deliver the two capabilities, operational and analytical, in one, thus greatly increasing efficiency. They have come up with a transactional management system which scales up linearly to very large volumes — enabling the operational part of the database to bear the analytical load. They have created three new management systems. The first is a key value data store, a kind of NoSQL technology used to store the data of combined database. The second, a complex event processing system, allows users to stream data from real-time events. The third is a distributed SQL query engine which can harness multiple computers to tackle a single query. ‘This means we can answer a query in online response time, that is the time a typical online user would be prepared to wait,’ says Dr Jiménez. The team has tested their technology through case studies. These included studying the feelings of voters in US and Spanish elections by analysing their tweets in real time. This showed how sentiments were evolving, but also allowed analysts to see what was behind those feelings – for instance by looking at which words were used most frequently. ‘When the emails scandal erupted, you could use the system to see how many tweets were about Clinton’s reputation,’ says Dr Jiménez, ‘our goal wasn’t to predict the results but it would have provided useful information for analysts.’ A second trial conducted in Italy used people’s social media footprints to build profiles of customers and help banks detect cases of identity fraud. Business analytics in real time The LEANBIGDATA team are confident their unified platform can address the different data needs of big organisations. It could reduce the cost of doing data analytics by half by avoiding the need to set up and maintain ETL. ‘Businesses can gain a lot of agility because they will be empowered to do real-time business analytics,’ says Dr Jiménez. LeanXcale, set up by LEANBIGDATA lead institution, the Technical University of Madrid, is aiming for commercial launch in autumn 2017. It is already building proofs of concepts with banks, telecommunications companies, large retailers and travel tech companies.

Keywords

LEANBIGDATA, SQL databases, big data management, big data, key value data store, complex event processing, operational database, data warehouse, real-time analytics

Discover other articles in the same domain of application