Periodic Reporting for period 2 - HEROES (Hybrid Eco Responsible Optimized European Solution)
Período documentado: 2022-03-01 hasta 2023-02-28
HEROES is both a software platform designed to be multi-tenant with security at its core and will be deployed in a wide variety of contexts all allowing HPC users to access a choice of different HPC infrastructures (whether they seat on their own premises, in the public cloud, managed by specialised HPC providers or in national/European public Research Centers) and submit the typical HPC Workflows on the “best platform for the job” determined by its decision module.
It is equally critical to use High Performance Computing resources in order to address the challenges posed by Climate Change and Global Warming and to make a more eco-responsible use of these HPC resources. The HEROES Platform understands energy consumption across HPC platforms and can be configured to point to the most eco-responsible option and either recommend or automatically choose it. The project team is convinced that showing HPC impact on environment will help to progressively - through education - change users’ behaviour.
By connecting HPC offer to User demand through a configurable framework, the HEROES project enables the concept of marketplaces to pave the way to a European HPC economy conscious of the importance to be energy efficient and not only performance/price oriented.
All modules of the prototype (API, identity/data/workflow management, application containerization, decision module, energy monitoring and optimization, user interface…) are functional (though could be hardened and extended for a production platform) and the whole platform relies on DevOps tools managing the different services through a Kubernetes cluster.
Concerning the main elements of the prototype:
1. Workflows: two have been selected to reflect classical HPC and AI use cases: a CFD workflow with OpenFoam, and an object recognition in videos with Tensorflow. Applications have been containerized with Singularity and integrated with the workflow manager (NextFlow).
2. Web interface: a simple web interface has been implemented and allows a user to execute the use cases
3. Energy: EAR has been extended to have a Lite version which does not require root privileges to be installed; a new reporting plugin with HEROES requirements has been added; additional metrics have been added.
4. Workflow Placement: a new Decision Module plugin has been added to OKA, it leverages machine learning and multi-criteria decision algorithms to propose placement options for the workflows. OKA analytical and predictive capabilities have been enhanced to support the use cases and integrate with EAR for energy metrics.
5. Identity, data and workflow management: multiple services have been implemented to provide an abstraction to the target clusters. Through a single entry-point (REST APIs), users can manage data in dedicated object stores managed by HEROES per user/organization. Workflow submission is simplified through the selection of available workflow templates, integration with the Decision Module, and integration of an updated version of Nextflow. Identity management is delegated to a centralized Keycloak, and access to remote clusters is done in a secure way despite sharing accounts on the clusters (thanks to containerization and data management with the HEROES-FS).