Journal Article

AWAKE beamline showing location of the matching devices (actions) and the observation BTV.

Towards automatic setup of 18 MeV electron beamline using machine learning

F. M. Velotti1, B. Goddard1, V. Kain1, R. Ramjiawan1, G. Z. Della Porta1 and S. Hirlaender2 1CERN, 2University of Salzburg Machine Learning: Science and Technology Abstract To improve the performance-critical stability and brightness of the electron bunch at injection into the proton-driven plasma wakefield at the AWAKE CERN experiment, automation approaches based on unsupervised machine learning (ML) were developed and deployed. Numerical optimisers were tested together with different model-free reinforcement learning (RL) agents. In order to avoid any bias, RL agents have been trained also using a completely unsupervised state encoding using auto-encoders. To aid hyper-parameter selection, a full synthetic model of the beamline was constructed using a variational auto-encoder trained to generate surrogate data from equipment settings. This paper describes the novel approaches based on deep learning and RL to aid the automatic setup of a low energy line, as the one used to deliver beam to the AWAKE facility. The results obtained with the different ML approaches, including automatic unsupervised feature extraction from images using computer vision are presented. The prospects for operational deployment and wider applicability are discussed. ...

Overview of the orbit correction method.

Orbit Correction Based on Improved Reinforcement Learning Algorithm

X. Chen, Y. Jia, X. Qi, Z. Wang, Y. He Chinese Academy of Sciences Physical Review Accelerators and Beams Abstract Recently, reinforcement learning (RL) algorithms have been applied to a wide range of control problems in accelerator commissioning. In order to achieve efficient and fast control, these algorithms need to be highly efficient, so as to minimize the online training time. In this paper, we incorporated the beam position monitor trend into the observation space of the twin delayed deep deterministic policy gradient (TD3) algorithm and trained two different structure agents, one based on physical prior knowledge and the other using the original TD3 network architecture. Both of the agents exhibit strong robustness in the simulated environment. The effectiveness of the agent based on physical prior knowledge has been validated in a real accelerator. Results show that the agent can overcome the difference between simulated and real accelerator environments. Once the training is completed in the simulated environment, the agent can be directly applied to the real accelerator without any online training process. The RL agent is deployed to the medium energy beam transport section of China Accelerator Facility for Superheavy Elements. Fast and automatic orbit correction is being tested with up to ten degrees of freedom. The experimental results show that the agents can correct the orbit to within 1 mm. Moreover, due to the strong robustness of the agent, when a trained agent is applied to different lattices of different particles, the orbit correction can still be completed. Since there are no online data collection and training processes, all online corrections are done within 30 s. This paper shows that, as long as the robustness of the RL algorithm is sufficient, the offline learning agents can be directly applied to online correction, which will greatly improve the efficiency of orbit correction. Such an approach to RL may find promising applications in other areas of accelerator commissioning. ...

Schema of the parameters’role within the learning loop.

Optimizing a superconducting radio-frequency gun using deep reinforcement learning

D. Meier1, L. V. Ramirez1, J. Völker1, J. Viefhaus1, B. Sick2, G. Hartmann1 1Helmholtz-Zentrum Berlin, 2University of Kassel Physical Review Accelerators and Beams Abstract Superconducting photoelectron injectors are promising for generating highly brilliant pulsed electron beams with high repetition rates and low emittances. Experiments such as ultrafast electron diffraction, experiments at the Terahertz scale, and energy recovery linac applications require such properties. However, optimizing the beam properties is challenging due to the high number of possible machine parameter combinations. This article shows the successful automated optimization of beam properties utilizing an already existing simulation model. To reduce the required computation time, we replace the costly simulation with a faster approximation with a neural network. For optimization, we propose a reinforcement learning approach leveraging the simple computation of the derivative of the approximation. We prove that our approach outperforms standard optimization methods for the required function evaluations given a defined minimum accuracy. ...

Episodes from the best NAF2 agent and the PI controller with the same initial states and with a varying additive Gaussian action noise with zero mean and standard deviation as a percentage of the half action space [0, 1]. (A) 0%, (B) 10%, (C) 25%, and (D) 50% Gaussian action noise.

Application of reinforcement learning in the LHC tune feedback

L. Grech1, G. Valentino1, D. Alves2 and Simon Hirlaender3 1University of Malta, 2CERN, 3University of Salzburg Frontiers in Physics Abstract The Beam-Based Feedback System (BBFS) was primarily responsible for correcting the beam energy, orbit and tune in the CERN Large Hadron Collider (LHC). A major code renovation of the BBFS was planned and carried out during the LHC Long Shutdown 2 (LS2). This work consists of an explorative study to solve a beam-based control problem, the tune feedback (QFB), utilising state-of-the-art Reinforcement Learning (RL). A simulation environment was created to mimic the operation of the QFB. A series of RL agents were trained, and the best-performing agents were then subjected to a set of well-designed tests. The original feedback controller used in the QFB was reimplemented to compare the performance of the classical approach to the performance of selected RL agents in the test scenarios. Results from the simulated environment show that the RL agent performance can exceed the controller-based paradigm. ...

Schematic view of the GMPS control environment.

Real-time artificial intelligence for accelerator control: A study at the Fermilab Booster

J. St. John1, C. Herwig1, D. Kafkes1, J. Mitrevski1, W. A. Pellico1, G. N. Perdue1, A. Quintero-Parra1, B. A. Schupbach1, K. Seiya1, N. Tran1, M. Schram2, J. M. Duarte3, Y. Huang4, R. Keller5 1Fermi National Accelerator Laboratory, 2Thomas Jefferson National Accelerator Laboratory, 3University of California San Diego, 4Pacific Northwest National Laboratory, 5Columbia University Physical Review Accelerators and Beams Abstract We describe a method for precisely regulating the gradient magnet power supply (GMPS) at the Fermilab Booster accelerator complex using a neural network trained via reinforcement learning. We demonstrate preliminary results by training a surrogate machine-learning model on real accelerator data to emulate the GMPS, and using this surrogate model in turn to train the neural network for its regulation task. We additionally show how the neural networks to be deployed for control purposes may be compiled to execute on field-programmable gate arrays (FPGAs), and show the first machine-learning based control algorithm implemented on an FPGA for controls at the Fermilab accelerator complex. As there are no surprise latencies on an FPGA, this capability is important for operational stability in complicated environments such as an accelerator facility. ...

Accelerated Deep Reinforcement Learning for Fast Feedback of Beam Dynamics at KARA

W. Wang1, M. Caselle1, T. Boltz1, E. Blomley1, M. Brosi1, T. Dritschler1, A. Ebersoldt1, A. Kopmann1, A. Santamaria Garcia1, P. Schreiber1, E. Bründermann1, M. Weber1, A.-S. Müller1, Y. Fang2 1Karlsruhe Insitute of Technology KIT, 2Northwestern Polytechnical University IEEE Transactions on Nuclear Science Abstract Coherent synchrotron radiation (CSR) is generated when the electron bunch length is in the order of the magnitude of the wavelength of the emitted radiation. The self-interaction of short electron bunches with their own electromagnetic fields changes the longitudinal beam dynamics significantly. Above a certain current threshold, the micro-bunching instability develops, characterized by the appearance of distinguishable substructures in the longitudinal phase space of the bunch. To stabilize the CSR emission, a real-time feedback control loop based on reinforcement learning (RL) is proposed. Informed by the available THz diagnostics, the feedback is designed to act on the radio frequency (RF) system of the storage ring to mitigate the micro-bunching dynamics. To satisfy low-latency requirements given by the longitudinal beam dynamics, the RL controller has been implemented on hardware (FPGA). In this article, a real-time feedback loop architecture and its performance is presented and compared with a software implementation using Keras-RL on CPU/GPU. The results obtained with the CSR simulation Inovesa demonstrate that the functionality of both platforms is equivalent. The training performance of the hardware implementation is similar to software solution, while it outperforms the Keras-RL implementation by an order of magnitude. The presented RL hardware controller is considered as an essential platform for the development of intelligent CSR control systems. ...

Plot of the reward received by the agent versus step number.

Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra

F. H. O’Shea1, N. Bruchon2, G. Gaio1 1Elettra Sincrotrone Trieste, 2University of Trieste Physical Review Accelerators and Beams Abstract In this article we report on the application of a model-free reinforcement learning method to the optimization of accelerator systems. We simplify a policy gradient algorithm to accelerator control from sophisticated algorithms that have recently been demonstrated to solve complex dynamic problems. After outlining a theoretical basis for the functioning of the algorithm, we explore the small hyperparameter space to develop intuition about said parameters using a simple number-guess environment. Finally, we demonstrate the algorithm optimizing both a free-electron laser and an accelerator-based terahertz source in-situ. The algorithm is applied to different accelerator control systems and optimizes the desired signals in a few hundred steps without any domain knowledge using up to five control parameters. In addition, the algorithm shows modest tolerance to accelerator fault conditions without any special preparation for such conditions. ...

The RL paradigm as applied to particle accelerator control, showing the example of trajectory correction.

Sample-efficient reinforcement learning for CERN accelerator control

V. Kain1, S. Hirlander1, B. Goddard1, F. M. Velotti1, G. Z. Della Porta1, N. Bruchon2, G. Valentino3 1CERN, 2University of Trieste, 3University of Malta Physical Review Accelerators and Beams Abstract Numerical optimization algorithms are already established tools to increase and stabilize the performance of particle accelerators. These algorithms have many advantages, are available out of the box, and can be adapted to a wide range of optimization problems in accelerator operation. The next boost in efficiency is expected to come from reinforcement learning algorithms that learn the optimal policy for a certain control problem and hence, once trained, can do without the time-consuming exploration phase needed for numerical optimizers. To investigate this approach, continuous model-free reinforcement learning with up to 16 degrees of freedom was developed and successfully tested at various facilities at CERN. The approach and algorithms used are discussed and the results obtained for trajectory steering at the AWAKE electron line and LINAC4 are presented. The necessary next steps, such as uncertainty aware model-based approaches, and the potential for future applications at particle accelerators are addressed. ...

Simple scheme of the FERMI FEL seed laser alignment set up.

Basic Reinforcement Learning Techniques to Control the Intensity of a Seeded Free-Electron Laser

N. Bruchon1, G. Fenu1, G. Gaio2, M. Lonza2, F. H. O’Shea2, F. A. Pellegrino1, E. Salvato1 1University of Trieste, 2Elettra Sincrotrone Trieste Electronics Abstract Optimal tuning of particle accelerators is a challenging task. Many different approaches have been proposed in the past to solve two main problems—attainment of an optimal working point and performance recovery after machine drifts. The most classical model-free techniques (e.g., Gradient Ascent or Extremum Seeking algorithms) have some intrinsic limitations. To overcome those limitations, Machine Learning tools, in particular Reinforcement Learning (RL), are attracting more and more attention in the particle accelerator community. We investigate the feasibility of RL model-free approaches to align the seed laser, as well as other service lasers, at FERMI, the free-electron laser facility at Elettra Sincrotrone Trieste. We apply two different techniques—the first, based on the episodic Q-learning with linear function approximation, for performance optimization; the second, based on the continuous Natural Policy Gradient REINFORCE algorithm, for performance recovery. Despite the simplicity of these approaches, we report satisfactory preliminary results, that represent the first step toward a new fully automatic procedure for the alignment of the seed laser to the electron beam. Such an alignment is, at present, performed manually. ...

Orbit Correction Studies Using Neural Networks

E. Meier, Y.-R. E. Tan, G. S. LeBlanc Australian Synchrotron 3rd International Particle Accelerator Conference Abstract This paper reports the use of neural networks for orbitcorrection at the Australian Synchrotron Storage Ring. Theproposed system uses two neural networks in an actor-criticscheme to model a long term cost function and computeappropriate corrections. The system is entirely based onthe history of the beam position and the actuators, i.e. thecorrector magnets, in the storage ring. This makes the sys-tem auto-tuneable, which has the advantage of avoiding themeasure of a response matrix. The controller will automat-ically maintain an updated BPM corrector response matrix.In future if coupled with some form of orbit response anal-ysis, the system will have the potential to track drifts orchanges to the lattice functions in ”real time”. As a genericand robust orbit correction program it can be used duringcommissioning and in slow orbit feedback. In this study,we present positive initial results of the simulations of thestorage ring in Matlab. ...