Better software for better science
Within much of the public and private sectors, cloud computing is now ubiquitous, an integrated part of today’s Infrastructure as a Service (IaaS). However, its use has not yet been fully embraced by the scientific community, particularly at the Platform as a Service (PaaS) and Software as a Service (SaaS) levels. Seeing that this community could benefit from cloud computing, the INDIGO-DATACLOUD project developed a data and computing platform specifically for scientific communities. ‘The goal of this project was to provide the European scientific community with the tools they need to do research more effectively,’ says project coordinator Davide Salomoni. ‘To do this, we developed a platform able to simultaneously respond to the calculation, processing and data storage needs of researchers from very different disciplines.’ Real solutions for real problems Producing the INDIGO software required the research team to exploit key European know-how and reuse and extend open source software. Stringent software development and management processes were defined, and internal distributed test-beds for software and application development and pre-production were set up. ‘We started the project knowing very well that, to effectively use distributed resources, scientific communities had to undergo complex procedures,’ says Salomoni. ‘Sometimes it wasn’t even possible to exploit available resources at all.’ According to Salomoni, challenges ranged from finding the right resources in the first place to making sure that authentication and authorisation for their use was flexible. Other challenges included porting traditional applications to distributed environments and efficiently running them, federating both compute and data across borders, and expressing high-level requirements that could eventually be translated into solutions without the need for extensive IT knowledge. ‘We started from the actual issues that many scientific communities had flagged for us and wrote open software components that can be combined, integrated and deployed into e-infrastructures aimed at solving those issues,’ explains Salomoni. ‘This process immediately highlighted the need for a dramatic simplification and extension of IT tools and methods, so scientists could use the resources to solve their problems without first having to become IT experts.’ The resulting INDIGO architecture consists of two major software versions and 14 minor updates, all released free-of-charge with an open source license. The first version, called Midnight Blue, provides a flexible platform capable of operating on both public and private cloud infrastructures. The second, ElectricIndigo, builds from and expands on the first version to enhance stability and provide more programmability, scalability, automation and flexibility. The INDIGO software releases can be downloaded here. Bringing researchers together Research in Europe is fragmented, a situation that has led to inefficiencies and the suboptimal use of funded resources and know-how. But thanks to projects like INDIGO-DATACLOUD, which supports the EU’s European Open Science Cloud (EOSC) initiative, these pieces are starting to come together. ‘We have proven that scientific fields as diverse as cultural heritage, physics, bioinformatics, medical imaging, astronomy, climatology and many others can easily and effectively run their applications in public or private distributed environments by integrating INDIGO components,’ concludes Salomoni.
Keywords
INDIGO-DATACLOUD, cloud computing, scientific communities, research, open source