Combining operational and analytical databases in a single platform
A Spanish-led project aims to enable companies to do far more with their Big Data and use far fewer resources to do so. LEANBIGDATA has developed a platform for managing Big Data which is ultra efficient and highly scalable. Big businesses and organisations process ever-increasing amounts of data. But the techniques they use to do this are often inefficient and consume large amounts of resources. Organisations typically use two databases, one for operational data and a second for data warehousing. In order to analyse the data, it must be copied from the first to the second and, seeing as data quickly becomes stale, this must be done regularly — usually every day. Such a process, known as extraction-transform-load or ETL, is expensive to set up and maintain. ‘This accounts for 75 % to 80 % of the cost of data analytics,’ says Ricardo Jiménez, LEANBIGDATA’s technical coordinator and CEO & co-founder of LeanXcale, a spin-off set up to commercialise the project’s core results. What is more, big data analysis tends to run in batch mode rather than real time, so users cannot react quickly to events. Two for the price of one The LEANBIGDATA team has designed an architectural solution which can deliver the two capabilities, operational and analytical, in one, thus greatly increasing efficiency. They have come up with a transactional management system which scales up linearly to very large volumes — enabling the operational part of the database to bear the analytical load. They have created three new management systems. The first is a key value data store, a kind of NoSQL technology used to store the data of combined database. The second, a complex event processing system, allows users to stream data from real-time events. The third is a distributed SQL query engine which can harness multiple computers to tackle a single query. ‘This means we can answer a query in online response time, that is the time a typical online user would be prepared to wait,’ says Dr Jiménez. The team has tested their technology through case studies. These included studying the feelings of voters in US and Spanish elections by analysing their tweets in real time. This showed how sentiments were evolving, but also allowed analysts to see what was behind those feelings – for instance by looking at which words were used most frequently. ‘When the emails scandal erupted, you could use the system to see how many tweets were about Clinton’s reputation,’ says Dr Jiménez, ‘our goal wasn’t to predict the results but it would have provided useful information for analysts.’ A second trial conducted in Italy used people’s social media footprints to build profiles of customers and help banks detect cases of identity fraud. Business analytics in real time The LEANBIGDATA team are confident their unified platform can address the different data needs of big organisations. It could reduce the cost of doing data analytics by half by avoiding the need to set up and maintain ETL. ‘Businesses can gain a lot of agility because they will be empowered to do real-time business analytics,’ says Dr Jiménez. LeanXcale, set up by LEANBIGDATA lead institution, the Technical University of Madrid, is aiming for commercial launch in autumn 2017. It is already building proofs of concepts with banks, telecommunications companies, large retailers and travel tech companies.
Keywords
LEANBIGDATA, SQL databases, big data management, big data, key value data store, complex event processing, operational database, data warehouse, real-time analytics