RL4AA Collaboration

AWAKE beamline showing location of the matching devices (actions) and the observation BTV.

Towards automatic setup of 18 MeV electron beamline using machine learning

F. M. Velotti1, B. Goddard1, V. Kain1, R. Ramjiawan1, G. Z. Della Porta1 and S. Hirlaender2 1CERN, 2University of Salzburg Machine Learning: Science and Technology Abstract To improve the performance-critical stability and brightness of the electron bunch at injection into the proton-driven plasma wakefield at the AWAKE CERN experiment, automation approaches based on unsupervised machine learning (ML) were developed and deployed. Numerical optimisers were tested together with different model-free reinforcement learning (RL) agents. In order to avoid any bias, RL agents have been trained also using a completely unsupervised state encoding using auto-encoders. To aid hyper-parameter selection, a full synthetic model of the beamline was constructed using a variational auto-encoder trained to generate surrogate data from equipment settings. This paper describes the novel approaches based on deep learning and RL to aid the automatic setup of a low energy line, as the one used to deliver beam to the AWAKE facility. The results obtained with the different ML approaches, including automatic unsupervised feature extraction from images using computer vision are presented. The prospects for operational deployment and wider applicability are discussed. ...

Overview of the orbit correction method.

Orbit Correction Based on Improved Reinforcement Learning Algorithm

X. Chen, Y. Jia, X. Qi, Z. Wang, Y. He Chinese Academy of Sciences Physical Review Accelerators and Beams Abstract Recently, reinforcement learning (RL) algorithms have been applied to a wide range of control problems in accelerator commissioning. In order to achieve efficient and fast control, these algorithms need to be highly efficient, so as to minimize the online training time. In this paper, we incorporated the beam position monitor trend into the observation space of the twin delayed deep deterministic policy gradient (TD3) algorithm and trained two different structure agents, one based on physical prior knowledge and the other using the original TD3 network architecture. Both of the agents exhibit strong robustness in the simulated environment. The effectiveness of the agent based on physical prior knowledge has been validated in a real accelerator. Results show that the agent can overcome the difference between simulated and real accelerator environments. Once the training is completed in the simulated environment, the agent can be directly applied to the real accelerator without any online training process. The RL agent is deployed to the medium energy beam transport section of China Accelerator Facility for Superheavy Elements. Fast and automatic orbit correction is being tested with up to ten degrees of freedom. The experimental results show that the agents can correct the orbit to within 1 mm. Moreover, due to the strong robustness of the agent, when a trained agent is applied to different lattices of different particles, the orbit correction can still be completed. Since there are no online data collection and training processes, all online corrections are done within 30 s. This paper shows that, as long as the robustness of the RL algorithm is sufficient, the offline learning agents can be directly applied to online correction, which will greatly improve the efficiency of orbit correction. Such an approach to RL may find promising applications in other areas of accelerator commissioning. ...

RL4AA'23: 1st Collaboration Workshop on Reinforcement Learning for Autonomous Accelerators

Reinforcement learning is the most difficult learning paradigms to understand and to efficiently use, but it holds a lot of promise in the field of accelerator physics. The applications of reinforcement learning to accelerators today are not very numerous yet, but the interest of the community is growing considerably. This is how the 1st collaboration workshop on Reinforcement Learning for Autonomous Accelerators (RL4AA'23) came to be! The AI4Accelerators team organized and hosted the workshop at KIT, gathering colleagues involved in reinforcement learning. The workshop offered introductory lectures to reinforcement learning, a Python tutorial that studied the real deployment of such an algorithm in a real accelerator, and guided discussion sessions on the most pressing topics. The contents of the discussion will be published in the form of proceedings later. ...

Schema of the parameters’role within the learning loop.

Optimizing a superconducting radio-frequency gun using deep reinforcement learning

D. Meier1, L. V. Ramirez1, J. Völker1, J. Viefhaus1, B. Sick2, G. Hartmann1 1Helmholtz-Zentrum Berlin, 2University of Kassel Physical Review Accelerators and Beams Abstract Superconducting photoelectron injectors are promising for generating highly brilliant pulsed electron beams with high repetition rates and low emittances. Experiments such as ultrafast electron diffraction, experiments at the Terahertz scale, and energy recovery linac applications require such properties. However, optimizing the beam properties is challenging due to the high number of possible machine parameter combinations. This article shows the successful automated optimization of beam properties utilizing an already existing simulation model. To reduce the required computation time, we replace the costly simulation with a faster approximation with a neural network. For optimization, we propose a reinforcement learning approach leveraging the simple computation of the derivative of the approximation. We prove that our approach outperforms standard optimization methods for the required function evaluations given a defined minimum accuracy. ...

Episodes from the best NAF2 agent and the PI controller with the same initial states and with a varying additive Gaussian action noise with zero mean and standard deviation as a percentage of the half action space [0, 1]. (A) 0%, (B) 10%, (C) 25%, and (D) 50% Gaussian action noise.

Application of reinforcement learning in the LHC tune feedback

L. Grech1, G. Valentino1, D. Alves2 and Simon Hirlaender3 1University of Malta, 2CERN, 3University of Salzburg Frontiers in Physics Abstract The Beam-Based Feedback System (BBFS) was primarily responsible for correcting the beam energy, orbit and tune in the CERN Large Hadron Collider (LHC). A major code renovation of the BBFS was planned and carried out during the LHC Long Shutdown 2 (LS2). This work consists of an explorative study to solve a beam-based control problem, the tune feedback (QFB), utilising state-of-the-art Reinforcement Learning (RL). A simulation environment was created to mimic the operation of the QFB. A series of RL agents were trained, and the best-performing agents were then subjected to a set of well-designed tests. The original feedback controller used in the QFB was reimplemented to compare the performance of the classical approach to the performance of selected RL agents in the test scenarios. Results from the simulated environment show that the RL agent performance can exceed the controller-based paradigm. ...

Reinforcement learning loop for the ARES experimental area.

Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training

J. Kaiser, O. Stein, A. Eichler Deutsches Elektronen-Synchrotron DESY 39th International Conference on Machine Learning Abstract In recent work, it has been shown that reinforcement learning (RL) is capable of solving a variety of problems at sometimes super-human performance levels. But despite continued advances in the field, applying RL to complex real-world control and optimisation problems has proven difficult. In this contribution, we demonstrate how to successfully apply RL to the optimisation of a highly complex real-world machine – specifically a linear particle accelerator – in an only partially observable setting and without requiring training on the real machine. Our method outperforms conventional optimisation algorithms in both the achieved result and time taken as well as already achieving close to human-level performance. We expect that such automation of machine optimisation will push the limits of operability, increase machine availability and lead to a paradigm shift in how such machines are operated, ultimately facilitating advances in a variety of fields, such as science and medicine among many others. ...

Success rate of the various algorithms over initial beam intensity.

Automated Intensity Optimisation Using Reinforcement Learning at LEIR

N. Madysa, V. Kain, R. Alemany Fernandez, N. Biancacci, B. Goddard, F. M. Velotti CERN 13th Particle Accelerator Conference Abstract High intensities in the Low Energy Ion Ring (LEIR) at CERN are achieved by stacking several multi-turn injec- tions from the pre-accelerator Linac3. Up to seven consec- utive 200 μs long, 200 ms spaced pulses are injected from Linac3 into LEIR. Two inclined septa, one magnetic and one electrostatic, combined with a collapsing horizontal or- bit bump allows a 6-D phase space painting via a linearly ramped mean momentum along the Linac3 pulse and in- jection at high dispersion. The already circulating beam is cooled and dragged longitudinally via electron cooling (e- cooling) into a stacking momentum to free space for the fol- lowing injections. For optimal intensity accumulation, the electron energy and trajectory need to match the ion energy and orbit at the e-cooler section. ...

Planned hardware implementation of the proposed RL feedback scheme.

Micro-Bunching Control at Electron Storage Rings with Reinforcement Learning

T. Boltz Karlsruhe Insitute of Technology KIT PhD thesis Abstract At the time this thesis is written, the world finds itself amidst and partly in the process of recovering from the COVID-19 pandemic caused by the SARS-Cov-2 virus. One major contribution to the worldwide efforts of bringing this pandemic to an end are the vaccines developed by different research teams all around the globe. Produced in a remarkably short time frame, a crucial first step for the discovery of these vaccines was mapping out the atomic structure of the proteins making up the virus and their interactions. Due to the bright X-rays required in the process, synchrotron light sources play an active role in the ongoing efforts of accomplishing that goal. Synchrotron light sources are particle accelerators that are capable of providing intense electromagnetic radiation by accelerating packages of electrons, called bunches, and forcing them on curved trajectories. Besides the support of research on the SARS-Cov-2 virus, the remarkable properties of synchrotron radiation lead to a multitude of applications in a variety of scientific fields such as materials science, geology, biology and medicine. As a special form of synchrotron radiation, this thesis is concerned with the coherent synchrotron radiation (CSR) generated by short electron bunches in a storage ring. At wavelengths larger than the size of the emitting electron structure, the particles within a bunch radiate coherently. This coherent emission of synchrotron radiation scales with the number of involved particles and can thus enhance the intensity of the emitted radiation by several orders of magnitude. As a consequence, modern synchrotron light sources, such as the Karlsruhe Research Accelerator (KARA) at the Karlsruhe Institute of Technology (KIT), are deliberately operating with short bunch lengths to extend the radiated CSR spectrum to higher frequencies and to increase the intensity of the emitted radiation. Yet, the continuous reduction of the bunch length at high beam intensities eventually leads to complex longitudinal dynamics caused by the self-interaction of the electron bunches with their own emitted CSR. This phenomenon, generally referred to as micro-bunching or micro-wave instability, can lead to the formation of dynamically changing micro-structures within the charge distribution of the electron bunches and thus to a uctuating emission of CSR. Moreover, it can cause oscillations of the bunch length and the energy spread, which can be detrimental to the operation of a synchrotron light source. On the other hand, as electron structures smaller than the full electron bunch, the micro-structures created by the instability lead to an increased emission of CSR at frequencies up to the THz frequency range. The instability can thus also be beneficial for a variety of applications that rely on intense radiation in that particular frequency range. ...

Schematic view of the GMPS control environment.

Real-time artificial intelligence for accelerator control: A study at the Fermilab Booster

J. St. John1, C. Herwig1, D. Kafkes1, J. Mitrevski1, W. A. Pellico1, G. N. Perdue1, A. Quintero-Parra1, B. A. Schupbach1, K. Seiya1, N. Tran1, M. Schram2, J. M. Duarte3, Y. Huang4, R. Keller5 1Fermi National Accelerator Laboratory, 2Thomas Jefferson National Accelerator Laboratory, 3University of California San Diego, 4Pacific Northwest National Laboratory, 5Columbia University Physical Review Accelerators and Beams Abstract We describe a method for precisely regulating the gradient magnet power supply (GMPS) at the Fermilab Booster accelerator complex using a neural network trained via reinforcement learning. We demonstrate preliminary results by training a surrogate machine-learning model on real accelerator data to emulate the GMPS, and using this surrogate model in turn to train the neural network for its regulation task. We additionally show how the neural networks to be deployed for control purposes may be compiled to execute on field-programmable gate arrays (FPGAs), and show the first machine-learning based control algorithm implemented on an FPGA for controls at the Fermilab accelerator complex. As there are no surprise latencies on an FPGA, this capability is important for operational stability in complicated environments such as an accelerator facility. ...

Online training of NAF Agent of AWAKE electronline trajectory steering in the horizontal plane.

Test of Machine Learning at the CERN LINAC4

V. Kain1, N. Bruchon1, S. Hirlander1, N. Madysa1, I. Vojskovic1, P. Skowronski1, G. Valentino2 1CERN, 2University of Malta 61st ICFA ABDW on High-Intensity and High-Brightness Hadron Beams Abstract The CERN H−linear accelerator, LINAC4, served as atest bed for advanced algorithms during the CERN LongShutdown 2 in the years 2019/20. One of the main goals wasto show that reinforcement learning with all its benefits canbe used as a replacement for numerical optimization and asa complement to classical control in the accelerator controlcontext. Many of the algorithms used were prepared before-hand at the electron line of the AWAKE facility to makethe best use of the limited time available at LINAC4. Anoverview of the algorithms and concepts tested at LINAC4and AWAKE will be given and the results discussed. ...