Periodic Reporting for period 2 - LIGATE (LIgand Generator and portable drug discovery platform AT Exascale)
Período documentado: 2022-07-01 hasta 2024-04-30
Since the evolution of HPC architectures is heading toward specialization and extreme heterogeneity, the LIGATE solution also focuses on code portability with the possibility to deploy the CADD platform on any available type of architecture in order not to have a legacy in the hardware.
Code portability and application autotuning. LIGATE created a drug discovery platform for modern heterogeneous HPC infrastructures by porting LiGen to SYCL 2020, making it one of the first industrial applications of this standard. Dynamic and static auto-tuning techniques were applied to optimise performance across different GPU architectures, significantly increasing the efficiency of LiGen.
Free energy estimation workflows for Binding Affinity Prediction (BAP) and Pose Selection (PS). These workflows, which required over 1M GPU and CPU hours, were the most computationally intensive activities of the project. They generated training data for machine learning models in LiGen's scoring function. This effort is one of the largest molecular dynamics campaigns ever performed and required fully automated calculation of binding free energies for protein-ligand complexes.
AI for drug discovery. The project developed an efficient mechanism for integrating Python-based machine learning frameworks, such as graph neural networks, with high-throughput C++ applications. This allowed various ML scoring functions to be used within LiGen's virtual screening pipeline with minimal effort.
Efficient workflow deployment. The HyperQueue (HQ) task runtime is designed to manage heterogeneous resources and simplify task graph execution on HPC clusters. Integrating HQ into the CADD workflow has streamlined execution and improved resource utilisation.
Methods and datasets for continuous validation. The project established methods and datasets for continuous and automated evaluation of LiGen and its workflows. These modular and automated workflows ensure reproducibility, easy integration of new methods and seamless addition of new data.
Validation on target pathogens. A set of protein targets from viral pathogens that were experimentally unresolved were selected for validation. This provided a real-world use case to demonstrate the capabilities of the platform and potentially identify new drug targets.
Urgent computing and VsaaS. A key objective was to use LIGATE for urgent computing during pandemics, such as COVID-19. By sharing resources across several European supercomputing centres (e.g. Leonardo, LUMI, Karolina), the project demonstrated federated drug discovery runs using the LEXIS platform. This approach provided massive computing power, enabling screening of over 20 million ligands per second and processing of 1 billion ligands in less than a minute. This approach not only allows virtual screening efforts to be distributed across multiple sites, but also provides virtual screening-as-a-service without sharing proprietary code.
Taken together, these achievements demonstrate the success of the LIGATE project in advancing drug discovery through innovative, scalable and efficient solutions.