Democratising access to high performance computing
If industry or scientific communities are to take advantage of fast evolving advances in artificial intelligence (AI) and machine learning (ML), they need access to high performance computing (HPC) facilities to process large volumes of data. But these are costly and time-consuming to set up and run, and their energy use raises environmental concerns. The EU-funded HEROES project has developed software which matches HPC resources to users, to enable them to run simulations and ML workflows based on criteria such as time, budget and energy efficiency. “We are entering an era of complex digital choices where computational cost per second is vital. Our solution not only helps users navigate the options but democratises HPC, as users don’t need specialist IT knowledge to benefit,” says Philippe Bricard, HEROES project coordinator from UCIT, the project host. HEROES developed and tested a demonstrator trialling workflows from the renewable energy and manufacturing sectors.
Simplifying complex infrastructure options
For users of data-heavy technologies such as AI or ML, selecting the best HPC option to run simulations means not only understanding specialist IT jargon, but also weighing up multiple requirements related to budgets, turnaround times, data security and environmental footprint. The HEROES solution simplifies the process by finding choices for users. In the project’s prototype, once logged in, users select a workflow or job template, then input their job parameters (including constraints, such as cost). The software then offers a choice between four strategies, linked to HPC platforms. Once selected, the HPC resource runs the job, with the results returned back to the platform. In some cases the best solution will be hybrid, with some computing resources allocated in-house, others assigned externally, depending on the security restraints and business needs of the user. “In the future, our platform will be adapted to various contexts, from private organisations with on-premises HPC clusters and some public cloud resources, to more elaborate arrangements with external facilities once authentication issues are resolved,” explains Bricard. “Our ambition is to enable clients to use our platform to build and operate their own bespoke HPC marketplace.”
From architectural prototype to commercial solutions
The project’s prototype comprises a variety of task-specific modules, principally for: workflow and job orchestration; data transfer and storage; pricing and cost management; and energy monitoring and optimisation. At the solution’s core was the decision-making module which communicates with services providers, such as cloud service providers and HPC centres, centralising workflow information and suitable HPC options. “Most projects like this result in proof of concepts. We wanted to go further and actually start implementing, to start iterating upgrades based on actual use,” notes Bricard. The result was an integration of the decision-making module into the pre-existing data science platform for HPC environments, OKA, improving the latter’s effectiveness. EAR, the system software for data centre energy management, has also now been integrated into the HEROES platform. “A number of HEROES’s components are already available for integration into current computing infrastructures through our commercial venture ‘Do it Now’,” adds Bricard. The project’s results, along with its partners’ expertise, are in prime position to contribute to the EuroHPC Joint Undertaking initiative, set up to develop a world-class supercomputing ecosystem in Europe.
Keywords
HEROES, artificial intelligence, machine learning, high performance computing, workflows, energy efficiency, supercomputing