University of Malta

Episodes from the best NAF2 agent and the PI controller with the same initial states and with a varying additive Gaussian action noise with zero mean and standard deviation as a percentage of the half action space [0, 1]. (A) 0%, (B) 10%, (C) 25%, and (D) 50% Gaussian action noise.

Application of reinforcement learning in the LHC tune feedback

L. Grech1, G. Valentino1, D. Alves2 and Simon Hirlaender3 1University of Malta, 2CERN, 3University of Salzburg Frontiers in Physics Abstract The Beam-Based Feedback System (BBFS) was primarily responsible for correcting the beam energy, orbit and tune in the CERN Large Hadron Collider (LHC). A major code renovation of the BBFS was planned and carried out during the LHC Long Shutdown 2 (LS2). This work consists of an explorative study to solve a beam-based control problem, the tune feedback (QFB), utilising state-of-the-art Reinforcement Learning (RL). A simulation environment was created to mimic the operation of the QFB. A series of RL agents were trained, and the best-performing agents were then subjected to a set of well-designed tests. The original feedback controller used in the QFB was reimplemented to compare the performance of the classical approach to the performance of selected RL agents in the test scenarios. Results from the simulated environment show that the RL agent performance can exceed the controller-based paradigm. ...

Online training of NAF Agent of AWAKE electronline trajectory steering in the horizontal plane.

Test of Machine Learning at the CERN LINAC4

V. Kain1, N. Bruchon1, S. Hirlander1, N. Madysa1, I. Vojskovic1, P. Skowronski1, G. Valentino2 1CERN, 2University of Malta 61st ICFA ABDW on High-Intensity and High-Brightness Hadron Beams Abstract The CERN H−linear accelerator, LINAC4, served as atest bed for advanced algorithms during the CERN LongShutdown 2 in the years 2019/20. One of the main goals wasto show that reinforcement learning with all its benefits canbe used as a replacement for numerical optimization and asa complement to classical control in the accelerator controlcontext. Many of the algorithms used were prepared before-hand at the electron line of the AWAKE facility to makethe best use of the limited time available at LINAC4. Anoverview of the algorithms and concepts tested at LINAC4and AWAKE will be given and the results discussed. ...

Best PPO agent. Action is deterministic.

Renovation of the beam-based feedback systems in the LHC

L. Grech University of Malta PhD thesis Abstract The Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) is the largest synchrotron built to date, having a circumference of approx- imately 27km. The LHC is able to accelerate two counter-rotating proton and/or heavy-ion beams up to 7 TeV per charge. These highly energetic beams are contained inside a vacuum chamber with an inner diameter of 80 mm by means of strong mag- netic fields produced by superconducting magnets. A beam cleaning and machine protection system is in place to prevent high-energy halo particles from impacting and heating the superconducting magnets. ...

The RL paradigm as applied to particle accelerator control, showing the example of trajectory correction.

Sample-efficient reinforcement learning for CERN accelerator control

V. Kain1, S. Hirlander1, B. Goddard1, F. M. Velotti1, G. Z. Della Porta1, N. Bruchon2, G. Valentino3 1CERN, 2University of Trieste, 3University of Malta Physical Review Accelerators and Beams Abstract Numerical optimization algorithms are already established tools to increase and stabilize the performance of particle accelerators. These algorithms have many advantages, are available out of the box, and can be adapted to a wide range of optimization problems in accelerator operation. The next boost in efficiency is expected to come from reinforcement learning algorithms that learn the optimal policy for a certain control problem and hence, once trained, can do without the time-consuming exploration phase needed for numerical optimizers. To investigate this approach, continuous model-free reinforcement learning with up to 16 degrees of freedom was developed and successfully tested at various facilities at CERN. The approach and algorithms used are discussed and the results obtained for trajectory steering at the AWAKE electron line and LINAC4 are presented. The necessary next steps, such as uncertainty aware model-based approaches, and the potential for future applications at particle accelerators are addressed. ...