CRDTs for non-synchronised networked applications
CRDTs have been around for a while. These data structures can be replicated across multiple computers in a network and updated independently without the need for synchronisation, and can prevent inconsistencies by merging their respective updates. Yet, critical applications such as virtual wallets, advertising platforms, social networks or online games still tend to resort to synchronisation. ‘Synchronisation is antinomic to fast response and availability,’ says Marc Shapiro, computer scientist at UPMC-LIP6 and Inria, and coordinator of SYNCFREE. ‘As one computer awaits update permission from another, it does nothing useful, wastes resources, and does not even respond to the user. Worse, if the network breaks, the wait may last indefinitely, and the application is stuck.’ Thanks to CRDTs, the two above-mentioned computers can make their updates independently and at the same time: one sends its updates to the other; and each one merges the remote updates it receives with its own. ‘This requires careful programming, and does not work for all types of data. But it has proved to be very useful in practice. Plus, CRDT-based programs are “Available under Partition” (AP), meaning that each computer can do its job even when the network is broken. Since a computer running CRDTs does not ask for permission, it can always respond immediately, which also brings down cost,’ Shapiro explains. Unlike previous AP approaches, which were very complex to use for software developers, a CRDT has clear semantics, and encapsulates the details of remote communication and merging into an intuitive interface. One challenge on the path to increased commercial exploitation of CRDTs was the lack of applications and systems entirely based on this technology. This posed various problems, in particular with regards to how to build a CRDT-based communication and storage system, how to maintain desirable properties when updating different pieces of information In an AP system, how to program CRDT-based applications effectively, and how to scale to geo-distributed cloud computing systems with hundreds of computers. These obstacles are precisely those that SYNCFREE aimed to overcome. ‘The project has positive results in all these areas,’ Shapiro explains. ‘Among other things, we have built an effective open-source cloud database, Antidote, which has been demonstrated on hundreds of machines distributed worldwide. We developed new, more scalable approaches for storing CRDTs and transmitting updates. We have built large programs, in particular a demanding distributed benchmark called FMKe (patterned after the Danish healthcare network FMK). We have demonstrated a dataflow-style distributed programming language called LASP, and, finally we have designed a methodology – called Just-Right Consistency – for ensuring that data stored in this system actually behaves as intended by the application.’ Even before the project started, some of its concepts had already been picked up by industry, including the four industrial partners of SYNCFREE who have extensively contributed application requirements, development suggestions, and data from real, cloud-scale, deployed applications. One of them is using Antidote in one of its products. ‘Now that the SYNCFREE project is finished, there is a follow-up H2020 project named LIGHTKONE that we have just started, with more industrial partners. The aim of it to enable massive, distributed general-purpose computing “at the edge”. Whereas SYNCFREE focused very much on a datacentre environment, LIGHTKONE aims to keep the data and computations as close as possible to the users. This raises some exciting new challenges when scaling from a few hundred computers in a well-controlled environment, to millions, located in unpredictable environments,’ Shapiro says. ‘Independently, a company is using LASP in production, and we have high hopes for adoption of Antidote. We are even discussing plans for a start-up to further commercialise the results of the project.’ In the meantime, most project contributions, including Antidote, LASP, FMKe, and the Just-Right Consistency tools, are already available on GitHub under the Apache license.
Keywords
SYNCFREE, Antidote, CRDT, synchronisation, github, computer, application