Skip to main content
European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Proton strucure for discovery at the Large Hadron Collider

Periodic Reporting for period 4 - NNNPDF (Proton strucure for discovery at the Large Hadron Collider)

Período documentado: 2022-04-01 hasta 2023-03-31

This project addresses the determination of the structure of the proton, as probed in high-energy collisions such as at the Large Hadron Collider of CERN (LHC). Its main novel aspect consists of systematically using Machine Learning (ML) techniques. Subsidiary innovations deal with the way to estimate uncertainties in theory predictions and in improving their accuracy. The importance of this project for society is twofold.
First, this project impacts our understanding of the fundamental laws of nature, currently encoded in a theory, the standard model, which is tested experimentally at particle accelerators such as the LHC. No deviation between the theory and experiment has ever been observed, yet we know that it cannot be complete: for instance, it does not it account for dark matter which makes up about 85% of matter in the universe. The structure of the proton is determined by the strong interaction, one of the four fundamental forces of Nature. Understanding the structure of the proton enables the subtle tests of the standard model at the LHC which are currently our best way of going beyond the current theory. Also, it probes our understanding of the theory of strong interactions. Because the LHC is a proton accelerator, no discovery is possible without an understanding of proton structure. The techniques developed in this project make possible discoveries that will be the focus of experimentation at the LHC over the next two decades and set new standards in the understanding of proton structure. For instance, they made possible the discovery that the proton contains an "intrinsic" component of charm quarks - quarks whose mass is larger than that of the proton itself.
A second impact of this project comes from its methodology, namely the use of ML and the development of computational tools. ML tools are used to determine a true answer from fuzzy information. In the context of this project what is being determined is a statistical distribution of true answers, due to the quantum nature of the objects being studied, which can only be characterized in terms of probability distributions. These techniques are likely to be useful for situations in which there exists a distribution of possible answers, rather than a unique answer, such as medical diagnostics. Specifically, the ML methods developed in the project provide a methodology for the accurate determination of the uncertainty on the final answer. The project also involves the development of new computational tools, which have an impact on the wide community working on high-energy physics experiments and their interpretation. Several are already being used beyond the specific context of this project. Some have an impact on any application in which efficient numerical methods are used.
The final goal that the project has achieved is the development of a suite of ML tools that make possible a full determination of the proton structure while optimizing automatically the way information is extracted from the underlying data. Secondary goals, also achieved, include the accurate estimate of the uncertainty in theoretical computations involved in studies of proton structure and the development of computational and analytic tools for the implementation and optimization of wide classes of theoretical predictions.
A first phase (12 months) was devoted to setting up the project. A project assistant was recruited starting on day 1. A team of five was recruited, including two postdocs, two Ph.D. students, and an assistant professor.
Major preliminary work was accomplished by the PI: a state-of-the-art determination of the parton distribution functions (PDFs) that encode proton structure with current artificial intelligence-based, but not yet ML-based methodology.
The second phase (12 months) started when the initial team of five joined the PI. The team has since met once a week, with two more weekly meetings held jointly with the NNPDF collaboration led by the PI (http://nnpdf.mi.infn.it/). In-person meetings of the team with NNPDF were held in September 2018 in Gargnano (Italy), and in Amsterdam in February 2019. Two major results were obtained: 1) the initial proposal of a ML-based methodology for PDF determination, based on the automatic optimization of the methodology itself; 2) the development of a systematic methodology for the inclusion of theoretical uncertainties in PDF determination.
In the third phase (12 months), the team reached its full size, with two postdocs and a PhD student joining at the beginning of year 3. Weekly meetings continued remotely after the beginning of the covid pandemic in February 2020. In-person meetings of the team and NNPDF took place in Varenna (Italy) in end-August 2019 and in Amsterdam in February 2020. Two milestones were achieved: 1) the full implementation and validation of the aforementioned ML-based methodology for PDF determination; 2) the full implementation of a methodology for the simultaneous inclusion of QCD and electroweak corrections, i.e. a state-of-the-art theoretical description of the data. A wide variety of spillover results was obtained, involving public codes for data science tasks (such as hardware acceleration and numerical integration).
During the fourth phase (15 months) the main goal was achieved: an open-access code for PDF determination based on ML, and the construction of a first PDF set based on it. Because of the pandemic, tasks requiring more intensive face-to-face discussion were postponed and work was concentrated on tasks amenable to remote working, such as computer coding. The meeting foreseen in Gargnano in September 2020 was canceled and no winter meeting was held in 2021. Summer meetings were resumed in Gargnano in August 2021. A PhD student and postdoc left the project at the end of year 4. A six-month extension of the project was granted due to the pandemic.
During the final phase (15 months) a first showcase of the main result of the project was accomplished: the discovery of first evidence for an intrinsic charm component of the proton. The two ancillary developments of the project were also accomplished, i.e. the development of codes that allow for the inclusion of theoretical uncertainties and electroweak corrections. An in-person meeting scheduled in Amsterdam for February 2022 was moved to April due to covid, a Summer meeting was held in September 2022 in Gargnano and a final winter meeting in Amsterdam in February 2023. A PhD student and a postdoc left six months before the end of the project, with one postdoc and one PhD student remaining until the end.
Besides publication in scientific papers, presentations at conferences at workshops, and accessibility in public repositories, all results of the project have been made public in the form of open-source computer code with extensive documentation and compliant with the FAIR principles. A first summer school on "advanced artificial intelligence for high-energy physics", based on the use of this code, will be held in July 2023 https://aiep.lakecomoschool.org/
The main achievement of the project is the first machine-learning-based methodology for PDF determination, its validation, and its actual use for the construction of a first PDF set. This methodology is optimized, fully automatized, and implemented as an open-access code. It also includes tools for the determination of theoretical uncertainties and for the inclusion of electroweak corrections.
The neural network architecture
Hyperoptimization of the N3FIT code
The structure of the N3FIT code amenable to hyperoptimization