LIgand Generator and portable drug discovery platform AT Exascale

Periodic Reporting for period 2 - LIGATE (LIgand Generator and portable drug discovery platform AT Exascale)

Okres sprawozdawczy: 2022-07-01 do 2024-04-30

Pharmaceutical companies are investing in computer-aided drug discovery (CADD) to make drug development faster and more cost-effective. CADD uses computational tools to predict molecular interactions, analyze large datasets, and identify promising drug candidates, potentially reducing development time and costs while increasing success rates. High-Performance Computing (HPC) plays a crucial role in this process by handling large-scale virtual screenings and speeding up complex simulations, leveraging parallel computing to evaluate extensive libraries of potential drug molecules and model their behavior within target disease proteins. LIGATE Project aimed at integrating and co-designing best in class European open-source components together with proprietary (European) IPs to keep worldwide leadership on CADD solutions exploiting current and future EuroHPC supercomputers, fostering the European competitiveness in this field. The proposed LIGATE solution, in a fully integrated workflow, enables to deliver the result of a drug discovery campaign with the highest possible speed vs accuracy ratio, auto-tuning the solution’s parameters to meet the time and resource constraints, and ready to respond promptly for example to novel worldwide pandemic crises.
Since the evolution of HPC architectures is heading toward specialization and extreme heterogeneity, the LIGATE solution also focuses on code portability with the possibility to deploy the CADD platform on any available type of architecture in order not to have a legacy in the hardware.

The LIGATE project has achieved a number of key milestones in various aspects, enabling the design and implementation of an integrated drug design solution. In particular, we have developed a highly scalable and portable Computer-Aided Drug Design (CADD) platform optimised for exascale HPC systems, but this is coupled with a number of other achievements that we can summarise below.
Code portability and application autotuning. LIGATE created a drug discovery platform for modern heterogeneous HPC infrastructures by porting LiGen to SYCL 2020, making it one of the first industrial applications of this standard. Dynamic and static auto-tuning techniques were applied to optimise performance across different GPU architectures, significantly increasing the efficiency of LiGen.
Free energy estimation workflows for Binding Affinity Prediction (BAP) and Pose Selection (PS). These workflows, which required over 1M GPU and CPU hours, were the most computationally intensive activities of the project. They generated training data for machine learning models in LiGen's scoring function. This effort is one of the largest molecular dynamics campaigns ever performed and required fully automated calculation of binding free energies for protein-ligand complexes.
AI for drug discovery. The project developed an efficient mechanism for integrating Python-based machine learning frameworks, such as graph neural networks, with high-throughput C++ applications. This allowed various ML scoring functions to be used within LiGen's virtual screening pipeline with minimal effort.
Efficient workflow deployment. The HyperQueue (HQ) task runtime is designed to manage heterogeneous resources and simplify task graph execution on HPC clusters. Integrating HQ into the CADD workflow has streamlined execution and improved resource utilisation.
Methods and datasets for continuous validation. The project established methods and datasets for continuous and automated evaluation of LiGen and its workflows. These modular and automated workflows ensure reproducibility, easy integration of new methods and seamless addition of new data.
Validation on target pathogens. A set of protein targets from viral pathogens that were experimentally unresolved were selected for validation. This provided a real-world use case to demonstrate the capabilities of the platform and potentially identify new drug targets.
Urgent computing and VsaaS. A key objective was to use LIGATE for urgent computing during pandemics, such as COVID-19. By sharing resources across several European supercomputing centres (e.g. Leonardo, LUMI, Karolina), the project demonstrated federated drug discovery runs using the LEXIS platform. This approach provided massive computing power, enabling screening of over 20 million ligands per second and processing of 1 billion ligands in less than a minute. This approach not only allows virtual screening efforts to be distributed across multiple sites, but also provides virtual screening-as-a-service without sharing proprietary code.
Taken together, these achievements demonstrate the success of the LIGATE project in advancing drug discovery through innovative, scalable and efficient solutions.

To date, most of the applied drug discovery worldwide (including Europe) has relied entirely on closed tools from a few large US-based vendors. While some of these are good, European companies are less likely to cut deals with close strategic partnerships. This leads to extremely high licensing fees in particular for cluster usage. In this context, the LIGATE project builds on the outcomes of two previous European projects ANTAREX and EXSCALATE4COV which created and implemented the world's first HPC platform for in silico drug discovery. During the COVID19 pandemic, Europe achieved the largest virtual screening simulation ever, testing 1 trillion compounds via computer simulation on an 81 PFLOPS HPC infrastructure. The experiment was able to perform a 500 times larger experiment than the one done in USA on a 200 PFLOPS HPC infrastructure, in a comparable time. From this comparison, an important lesson can be drawn: the quality of the software is crucial for the performance of the applications, which is the real measure of competitiveness We believe that today, more than ever, European competitiveness has to rely more on the quality and scalability of the tools rather than a simple financial competition on the hardware acquisition. The current LIGATE virtual screening platform can run on any HPC architecture with peak performance on NVIDIA H100 of 3100 mols/sec per NVIDIA H100 GPU on the same set used for the Exscalate4CoV experiment and with the possibility to run at the speed of 20M mols/sec if using the LIGATE federated approach. This represents a remarkable increase compared to the previous approaches confirming the European sovereignty in HPC computer-aided drug discovery.

ligate-drug-discovery-workflow.jpg

overview-of-all-file-formats-supported-by-ligate.jpg

ligate-project-how-wps-interrelate.jpg

ligate-ml-activities-within-the-ligen-workflow.jpg

hyperqueue-pipeline-of-ligen-components.jpg

Periodic Reporting for period 2 - LIGATE (LIgand Generator and portable drug discovery platform AT Exascale)

Udostępnij tę stronę

Pobierz