1 Introduction

The increasingly large volume of data produced at the LHC and the new era of the High-Luminosity CERN Large Hadron Collider (LHC) poses a significant computational challenge in high energy physics (HEP). To face this, machine learning (ML) and deep neural networks (DNNs) are becoming powerful and ubiquitous tools for the analysis of particle collisions and their products, such as jets – collimated sprays of particles [1] produced in high energy collisions.

DNNs have been explored extensively for many tasks, such as classification [2,3,4,5], regression [6, 7], track reconstruction [8,9,10], anomaly detection [11,12,13,14,15,16,17], and simulation [18,19,20,21,22,23].Footnote 1 In particular, there has been recent success using networks that incorporate key inductive biases of HEP data, such as infrared and colinear (IRC) safety via energy flow networks [28] or graph neural networks (GNNs) [29,30,31] and permutation symmetry and sparsity of jet constituents via GNNs [5, 20, 32].

Embedding such inductive biases and symmetries into DNNs can not only improve performance, as demonstrated in the references above, but also improve interpretability and reduce the amount of required training data. Hence, in this paper, we explore another fundamental symmetry of our data: equivariance to Lorentz transformations. Lorentz symmetry has been successfully exploited recently in HEP for jet classification [33,34,35,36], with competitive and even state-of-the-art (SOTA) results. We expand this work to the tasks of data compression and anomaly detection by incorporating the Lorentz symmetry into an autoencoder.

Autoencoders learn to encode and decode input data into a learned latent space, and thus have interesting applications in both data compression [37, 38] and anomaly detection [11, 13,14,15,16,17, 39, 40]. Both tasks are particularly relevant for HEP, the former to cope with the storage and processing of the ever-increasing data collected at the LHC, and the latter for model-independent searches for new physics. Incorporating Lorentz equivariance into an autoencoder has the potential to not only increase performance in both regards, but also provide a more interpretable latent space and reduce training data requirements. To this end, in this paper, we develop a Lorentz-group-equivariant autoencoder (LGAE) and explore its performance and interpretability. We also train alternative architectures, including GNNs and convolutional neural networks (CNNs), with different inherent symmetries and find the LGAE outperforms them on reconstruction and anomaly detection tasks.

The principal results of this work demonstrate (i) that the advantage of incorporating Lorentz equivariance extends beyond whole jet classification to applications with particle-level outputs and (ii) the interpretability of Lorentz-equivariant models. The key challenges overcome in this work include: (i) training an equivariant autoencoder via particle-to-particle and permutation-invariant set-to-set losses (Sect. 4), (ii) defining a jet-level compression scheme for the latent space (Sect. 3), and (iii) optimizing the architecture for different tasks, such as reconstruction (Sect. 4.3) and anomaly detection (Sect. 4.4).

This paper is structured as follows. In Sect. 2, we discuss existing work, motivating the LGAE. We present the LGAE architecture in Sect. 3, and discuss experimental results on the reconstruction and anomaly detection of high energy jets in Sect. 4. We also demonstrate the interpretability of the model, by analyzing its latent space, and its data efficiency relative to baseline models. Finally, we conclude in Sect. 5.

2 Related work

In this section, we briefly review the large body of work on frameworks for equivariant neural networks in Sect. 2.1, recent progress in Lorentz-equivariant networks in Sect. 2.2, and finally, applications of autoencoders in HEP in Sect. 2.3.

2.1 Equivariant neural networks

A neural network \(\textrm{NN}: V \rightarrow W\) is said to be equivariant with respect to a group G if

$$\begin{aligned} \forall g \in G, v \in V :\textrm{NN} (\rho _V(g) \cdot v) = \rho _W(g) \cdot \textrm{NN}(v), \end{aligned}$$
(1)

where \(\rho _V:G \rightarrow \textrm{GL}(V)\) and \(\rho _W:G \rightarrow \textrm{GL}(W)\) are representations of G in spaces V and W respectively, where \(\textrm{GL}(X)\) is the general linear group of vector space X. The neural network is said to be invariant if \(\rho _W\) is a trivial representation, i.e. \(\rho _W(g) = \mathbbm {1}_W\) for all \(g \in G\).

Equivariance has long been built into a number of successful DNN architectures, such as translation equivariance in CNNs, and permutation equivariance in GNNs [41]. Recently, equivariance in DNNs has been extended to a broader set of symmetries, such as those corresponding to the 2-dimensional special orthogonal \(\textrm{SO}(2)\) [42], the Euclidean \(\textrm{E}(2)\) [43], the 3-dimensional special orthogonal \(\textrm{SO}(3)\) [44], the 3-dimensional Euclidean \(\textrm{E}(3)\) [45, 46] groups, and arbitrary matrix Lie groups [47].

Broadly, equivariance to a group G has been achieved either by extending the translation-equivariant convolutions in CNNs to more general symmetries with appropriately defined learnable filters [48,49,50,51], or by operating in the Fourier space of G, or a combination thereof. We employ the Fourier space approach, which uses the set of irreducible representations (irreps) of G as the basis for constructing equivariant maps [43, 52, 53].

2.2 Lorentz group equivariant neural networks

The Lorentz group \(\textrm{O}(3, 1)\) comprises the set of linear transformations between inertial frames with coincident origins. In this paper, we restrict ourselves to the special orthochronous Lorentz group \(\textrm{SO}^+(3, 1)\), which consists of all Lorentz transformations that preserve the orientation and direction of time. Lorentz symmetry, or invariance to transformations defined by the Lorentz group, is a fundamental symmetry of the data collected out of high-energy particle collisions.

Fig. 1
figure 1

Individual Lorentz group equivariant message passing (LMP) layers are shown on the left, and the LGAE architecture is built out of LMPs on the right. Here, \(\textrm{MixRep}\) denotes the node-level operator that upsamples features in each (mn) representation space to \(\tau _{(m, n)}\) channels; it appears as W in Eq. (5)

There have been some recent advances in incorporating this symmetry into NNs. The Lorentz group network (LGN) [33] was the first DNN architecture developed to be equivariant to the \(\textrm{SO}^+(3, 1)\) group, with an architecture similar to that of a GNN, but operating entirely in Fourier space on objects in irreps of the Lorentz group, and using tensor products between irreps and Clebsch–Gordan decompositions to introduce non-linearities in the network. More recently, LorentzNet [34, 35] uses a similar GNN framework for equivariance, with additional edge features – Minkowski inner products between node features – but restricting itself to only scalar and vector representations of the group. Both networks have been successful in jet classification, with LorentzNet achieving SOTA results in top quark and quark versus gluon classification, further demonstrating the benefit of incorporating physical inductive biases into network architectures. In this work, we build on top of the LGN framework to output not only scalars (e.g. jet class probabilities) but encode and reconstruct an input set of particles under the constraint of Lorentz group equivariance in an autoencoder-style architecture.

2.3 Autoencoders in HEP

An autoencoder is an NN architecture comprised of an encoder, which maps the input into a, typically lower dimensional, latent space, and a decoder, which attempts to reconstruct the original input from the latent features. By using a lower dimensional latent space, an autoencoder can learn a smaller representation of data that captures salient properties [54], which can be valuable in HEP for compressing the significant volumes of data collected at the LHC [55].

This learned representation can also be exploited for later downstream tasks, such as anomaly detection, where an autoencoder is trained to reconstruct data considered “background” to our signal, with the expectation that it will reconstruct the signal poorly relative to the background. Thus, examining the reconstruction loss of a trained autoencoder may allow the identification of anomalous data.Footnote 2 This can be an advantage in searches for new physics, since instead of having to specify a particular signal hypothesis, a broader search can be performed for data incompatible with the background. This approach has been successfully demonstrated in Refs. [12, 39, 40, 56,57,58,59,60,61].

Furthermore, there are many possible variations to the general autoencoder framework for alternative tasks [62, 63], such as variational autoencoders (VAEs) [64], which are popular generative models. To our knowledge, while there have been some recent efforts at GNN-based autoencoder models [16, 65], Lorentz equivariance has not yet been explored. In this work, we focus on data compression and anomaly detection but note that our model can be extended to further applications.

3 LGAE architecture

The LGAE is built out of Lorentz group-equivariant message passing (LMP) layers, which are identical to individual layers in the LGN [33]. We reinterpret them in the framework of message-passing neural networks [66], to highlight the connection to GNNs, and define them in Sect. 3.1. We then describe the encoder and decoder networks in Sects. 3.2 and 3.3, respectively. The LMP layers and LGAE architecture are depicted in Fig. 1. We provide the LGAE code, written in Python using the PyTorch ML framework [67] in Ref. [68].

3.1 LMP

LMP layers take as inputs fully-connected graphs with nodes representing particles and the Minkowski distance between respective node 4-vectors as edge features. Each node \(\mathcal {F}_i\) is defined by its features, all transforming under a corresponding irrep of the Lorentz group in the canonical basis [69], including at least one 4-vector (transforming under the (1/2, 1/2) representation) representing its 4-momentum. As in Ref. [33], we denote the number of features in each node transforming under the (mn) irrep as \(\tau _{(m,n)}\), referred to as the multiplicity of the (mn) representation.

The \((t+1)\)-th MP layer operation consists of message-passing between each pair of nodes, with a message \(m_{i j}^{(t)}\) to node i from node j (where \(j \ne i\)) and a self-interaction term \(m_{ii}\) defined as

$$\begin{aligned} m_{i j}^{(t)}&= f\left( \left( p_{ij}^{(t)}\right) ^2 \right) p_{ij}^{(t)} \otimes \mathcal {F}_j^{(t)} \end{aligned}$$
(2)
$$\begin{aligned} m_{i i}^{(t)}&= \mathcal {F}_i^{(t)} \otimes \mathcal {F}_i^{(t)} \end{aligned}$$
(3)

where \(\mathcal {F}_{i}^{(t)}\) are the node features of node i before the \((t+1)\)-th layer, \(p_{ij} = p_i - p_j\) is the difference between node four-vectors, \(p_{ij}^2\) is the squared Minkowski norm of \(p_{i j}\), and f is a learnable, differentiable function acting on Lorentz scalars. A Clebsch–Gordan (CG) decomposition, which reduces the features to direct sums of irreps of \(\textrm{SO}^+(3,1)\), is performed on both terms before concatenating them to produce the message \(m_i\) for node i:

$$\begin{aligned} m_i^{(t)} = \textrm{CG}\left[ m_{i i}^{(t)} \right] \oplus \textrm{CG}\left[ \sum _{j\ne i} m_{i j}^{(t)} \right] , \end{aligned}$$
(4)

where the summation over the destination node j ensures permutation symmetry because it treats all other nodes equally.

Finally, this aggregated message is used to update each node’s features, such that

$$\begin{aligned} \mathcal {F}_i^{(t+1)} = W^{(t+1)} \left( \mathcal {F}_i^{(t)} \oplus m_i^{(t)} \right) \end{aligned}$$
(5)

for all \(i \in \{1, \ldots , N_\textrm{particle}\}\), where \(W^{(t+1)}\) is a learnable node-wise operator which acts as separate fully-connected linear layers \(W^{(t+1)}_{(m, n)}\) on the set of components living within each separate (mn) representation space, outputting a chosen \(\tau _{(m,n)}^{(t+1)}\) number of components per representation. In practice, we then truncate the irreps to a maximum dimension to make computations more tractable.

3.2 Encoder

The encoder takes as input an N-particle cloud, where each particle is each associated with a 4-momentum vector and an arbitrary number of scalars representing physical features such as mass, charge, and spin. Each isotypic component is initially transformed to a chosen multiplicity of \(\left( \tau _{(m, n)}^{(0)} \right) _\textrm{E}\) via a node-wise operator \(W^{(0)}\) identical conceptually to \(W^{(t+1)}\) in Eq. (5). The resultant graph is then processed through \(N_{\textrm{MP}}^\textrm{E}\) LMP layers, specified by a sequence of multiplicities \(\left\{ \left( \tau _{(m, n)}^{(t)} \right) _\textrm{E} \right\} _{t=1}^{N_{\textrm{MP}}^\textrm{E}}\), where \(\left( \tau _{(m, n)}^{(t)} \right) _\textrm{E}\) is the multiplicity of the (mn) representation at the t-th layer. Weights are shared across the nodes in a layer to ensure permutation equivariance.

After the final MP layer, node features are aggregated to the latent space by a component-wise minimum (min), maximum (max), or mean. The min and max operations are performed on the respective Lorentz invariants. We also find, empirically, interesting performance by simply concatenating isotypic components across each particle and linearly “mixing” them via a learned matrix as in Eq. (5). Crucially, unlike in Eq. (5), where this operation only happens per particle, the concatenation across the particles imposes an ordering and, hence, breaks the permutation symmetry.

3.3 Decoder

The decoder recovers the N-particle cloud by acting on the latent space with N independent, learned linear operators, which again mix components living in the same representations. This cloud passes through \(N_{\textrm{MP}}^\textrm{D}\) LMP layers, specified by a sequence of multiplicities \(\left\{ \left( \tau _{(m, n)}^{(t)} \right) _\textrm{D} \right\} _{t=1}^{N_{\textrm{MP}}^\textrm{D}}\), where \(\left( \tau _{(m, n)}^{(t)} \right) _\textrm{D}\) is the multiplicity of the (mn) representation at the t-th LMP layer. After the LMP layers, node features are mixed back to the input representation space \(\left( D^{(0,0)} \right) ^{\oplus \tau _{(0,0)}^{(0)}} \oplus D^{(1/2, 1/2)}\) by applying a linear mixing layer and then truncating other isotypic components.

4 Experiments

We experiment with and evaluate the performance of the LGAE and baseline models on reconstruction and anomaly detection for simulated high-momentum jets. We describe the dataset in Sect. 4.1, the different models we consider in Sect. 4.2, the reconstruction and anomaly detection results in Sects. 4.3 and 4.4 respectively, an interpretation of the LGAE latent space in Sect. 4.5, and finally experiments of the data efficiency of the different models in Sect. 4.6.

Table 1 Summary of the relevant symmetries respected by each model discussed in Sect. 4

4.1 Dataset

The model is trained to reconstruct 30-particle high transverse momentum jets from the JetNet [70] dataset, obtained using the associated library [71], zero-padding jets with fewer than 30, produced from gluons and light quarks. These are collectively referred to as quantum chromodynamics (QCD) jets.

Jets in JetNet are first produced at leading-order using MADGRAPH 5_aMCATNLO [72] and decayed and showered with pythia 8.2 [73]. They are then discretized and smeared to take detector spatial and energy resolution respectively into account, with simulated tracking inefficiencies – emulating the effects of the CMS and ATLAS trackers and calorimeters – and finally clustered using the anti-\(k_{\textrm{T}} \) [74] algorithm with distance parameter \(R=0.8\). Further details on the generation and reconstruction process are available in Ref. [20]. The exact smearing parameters and calorimeter granularities used are reported in Table 2 of Ref. [75] and correspond to the “CMS-like” scenario.

We represent the jets as a point cloud of particles, termed a “particle cloud“, with the respective 3-momenta, in absolute coordinates, as particle features. In the processing step, each 3-momentum is converted to a 4-momentum: \(p^\mu = (|\textbf{p}|, \textbf{p})\), where we consider the mass of each particle to be negligible. We use a \(60\%/20\%/20\%\) training/testing/validation splitting for the total 177,000 jets. For evaluating performance in anomaly detection, we consider jets from JetNet produced by top quarks, W bosons, and Z bosons as our anomalous signals.

Finally, we note here that the detector and reconstruction effects in JetNet, and indeed in real data collected at the LHC, break the Lorentz symmetry; hence, Lorentz equivariance is generally an approximate rather than an exact symmetry of HEP data. We assume henceforth that the magnitude of the symmetry breaking is small enough that imposing exact Lorentz equivariance in the LGAE is still advantageous – and the high performance of the LGAE and classification models such as LorentzNet support this assumption. Nevertheless, important studies in future work may include quantifying this symmetry breaking and considering approximate, as well as exact, symmetries in neural networks.

4.2 Models

LGAE model results are presented using both the min-max (LGAE-Min-Max) and “mix” (LGAE-Mix) aggregation schemes for the latent space, which consists of varying numbers of complex Lorentz vectors – corresponding to different compression rates. We compare the LGAE to baseline GNN and CNN autoencoder models, referred to as “GNNAE” and “CNNAE” respectively.

The GNNAE model is composed of fully-connected MPNNs adapted from Ref. [20]. We experiment with two types of encodings: (1) particle-level (GNNAE-PL), as in the PGAE [16] model, which compresses the features per node in the graph but retains the graph structure in the latent space, and (2) jet-level (GNNAE-JL), which averages the features across each node to form the latent space, as in the LGAE. Particle-level encodings produce better performance overall for the GNNAE, but the jet-level provides a more fair comparison with the LGAE, which uses jet-level encoding to achieve a high level of compression of the features.

For the CNNAE, which is adapted from Ref. [76], the relative coordinates of each input jets’ particle constituents are first discretized into a \(40 \times 40\) grid. The particles are then represented as pixels in an image, with intensities corresponding to \(p_{\textrm{T}} ^\textrm{rel} \). Multiple particles per jet may correspond to the same pixel, in which case their \(p_{\textrm{T}} ^\textrm{rel}\) ’s are summed. The CNNAE has neither Lorentz nor permutation symmetry, however, it does have in-built translation equivariance in \(\eta -\phi \) space.

Hyperparameter and training details for all models can be found in Appendix A and Appendix B respectively, and a summary of the relevant symmetries respected by each model is provided in Table 1. The LGAE models are verified to be equivariant to Lorentz boosts and rotations up to numerical error, with details provided in Appendix C.

Fig. 2
figure 2

Jet image reconstructions by LGAE-Min-Max (\(\tau _{(1/2, 1/2)}=4\), \(56.67\%\) compression), LGAE-Mix (\(\tau _{(1/2, 1/2)}=9\), \(61.67\%\) compression), GNNAE-JL (\(\dim (L) = 55\), \(61.11\%\) compression), GNNAE-PL (\(\dim (L) = 2\times 30\), \(66.67\%\) compression), and CNNAE (\(\dim (L) = 55\), \(61.11\%\) compression)

4.3 Reconstruction

We evaluate the performance of the LGAE, GNNAE, and CNNAE models, with the different aggregation schemes discussed, on the reconstruction of the particle and jet features of QCD jets. We consider relative transverse momentum \(p_{\textrm{T}} ^\textrm{rel} = p_{\textrm{T}} ^\textrm{particle}/p_{\textrm{T}} ^\textrm{jet}\) and relative angular coordinates \(\eta ^\textrm{rel} =\eta ^\textrm{particle} - \eta ^\textrm{jet}\) and \(\phi ^\textrm{rel} =\phi ^\textrm{particle} - \phi ^\textrm{jet} \pmod {2\pi }\) as each particle’s features, and total jet mass, \(p_{\textrm{T}}\) and \(\eta \) as jet features. We define the compression rate as the ratio between the total dimension of the latent space and the number of features in the input space: \(30\ \textrm{particles} \times 3\ \mathrm {features\ per\ particle} = 90\).

Figure 2 shows random samples of jets, represented as discrete images in the angular-coordinate plane, reconstructed by the models with similar levels of compression in comparison to the true jets. Figure 3 shows histograms of the reconstructed features compared to the true distributions. The differences between the two distributions are quantified in Table 2 by calculating the median and interquartile ranges (IQR) of the relative errors between the reconstructed and true features. To calculate the relative errors of particle features for the permutation invariant LGAE and GNNAE models, particles are matched between the input and output clouds using the Jonker-Volgenant algorithm [77, 78] based on the L2 distance between particle features. Due to the discretization of the inputs to the CNNAE, reconstructing individual particle features is not possible; instead, only jet features are shown.Footnote 3

We can observe visually in Fig. 2 that out of the two permutation invariant models, while neither is able to reconstruct the jet substructure perfectly, the LGAE-Min-Max outperforms the GNNAE-JL. Perhaps surprisingly, the permutation-symmetry-breaking mix aggregation scheme improves the LGAE in this regard. Both visually in Fig. 3 and quantitatively from Tables 2 and 3, we conclude that the LGAE-Mix has the best performance overall, significantly outperforming the GNNAE and CNNAE models at similar compression rates. The LGAE-Min-Max model outperforms the GNNAE-JL in reconstructing all features and the GNNAE-PL in all but the IQR of the particle angular coordinates.

Fig. 3
figure 3

Top: particle momenta \((p_{\textrm{T}} ^\textrm{rel}, \eta ^\textrm{rel}, \phi ^\textrm{rel})\) reconstruction by LGAE-Min-Max (\(\tau _{(1/2, 1/2)}=4\), resulting in \(56.67\%\) compression) and and LGAE-Mix (\(\tau _{(1/2, 1/2)}=9\), resulting in \(61.67\%\) compression), and GNNAE-JL (\(\dim (L) = 55\), resulting in \(61.11\%\) compression) and GNNAE-PL (\(\dim (L) = 2\times 30\), resulting in \(66.67\%\) compression). The reconstructions by the CNNAE are not included due to the discrete values of \(\eta ^\textrm{rel} \) and \(\phi ^\textrm{rel} \), as discussed in the text. Bottom: jet feature \((M, p_{\textrm{T}}, \eta )\) reconstruction by the four models. For the jet feature reconstruction by the GNNAEs, the particle features in relative coordinates were transformed back to absolute coordinates before plotting. The jet \(\phi \) is not shown because it follows a uniform distribution in \((-\pi , \pi ]\) and is reconstructed well

Table 2 Median and IQR of relative errors in particle feature reconstruction of selected LGAE and GNNAE models. In each column, the best-performing latent space per model is italicized, and the best model overall is highlighted in bold
Table 3 Median and IQR of relative errors in jet feature reconstruction by selected LGAE and GNNAE models, along with the CNNAE model. In each column, the best performing latent space per model is italicised, and the best model overall is highlighted in bold

4.4 Anomaly detection

We test the performance of all models as unsupervised anomaly detection algorithms by pre-training them solely on QCD and then using the reconstruction error for the QCD and new signal jets as the discriminating variable. We consider top quark, \(\textrm{W} \) boson, and \(\textrm{Z} \) boson jets as potential signals and QCD as the “background”. We test the Chamfer distance, energy mover’s distance [79] – the earth mover’s distance applied to particle clouds, and MSE between input and output jets as reconstruction errors, and find the Chamfer distance most performant for all graph-based models. For the CNNAE, we use the MSE between the input and reconstructed image as the anomaly score.

Fig. 4
figure 4

Anomaly detections for the top quark signal (upper left), W boson signal (upper right), Z boson signal (lower left), and the combined signal (lower right) by the selected LGAE-Min-Max (\(\tau _{(1/2, 1/2)} = 7\)), LGAE-Mix (\(\tau _{(1/2, 1/2)}=2\)), GNNAE-JL (\(\dim (L) = 30\)), GNNAE-PL (\(\dim (L) = 2 \times 30\)), and CNNAE (\(\dim (L) = 55\)) models

Receiver operating characteristic (ROC) curves showing the signal efficiencies (\(\varepsilon _s\)) versus background efficiencies (\(\varepsilon _b\)) for individual and combined signals are shown in Fig. 4,Footnote 4 and \(\varepsilon _s\) values at particular background efficiencies are given in Table 4. We see that in general the permutation equivariant LGAE and GNNAE models outperform the CNNAE, strengthening the case for considering equivariance in neural networks. Furthermore, LGAE models have significantly higher signal efficiencies than GNNAEs and CNNAEs for all signals when rejecting \(>90\%\) of the background (which is the minimum level we typically require in HEP), and LGAE-Mix consistently performs better than LGAE-Min-Max.

Table 4 Anomaly detection metrics by a selected LGAE and GNNAE models, along with the CNNAE model. In each column, the best performing latent space per model is italicized, and the best model overall is highlighted in bold

4.5 Latent space interpretation

Fig. 5
figure 5

The correlations between the total momentum of the imaginary components in the \(\tau _{(1/2, 1/2)} = 2\) LGAE-Mix model and the target jet momenta. The Pearson correlation coefficient r is listed above

Fig. 6
figure 6

Top: distributions of the invariant mass squared of the latent 4-vectors and jet momenta of the LGAE-Mix with \(\tau _{(1/2, 1/2)} = 2\) latent 4-vectors. Bottom: distributions of the invariant mass squared of two latent 4-vectors and jet momenta of the LGAE-Min-Max with \(\tau _{(1/2, 1/2)} = 2\) latent 4-vectors

The outputs of the LGAE encoder are irreducible representations of the Lorentz groups; they consist of a pre-specified number of Lorentz scalars, vectors, and potentially higher-order representations. This implies a significantly more interpretable latent representation of the jets than traditional autoencoders, as the information distributed across the latent space is now disentangled between the different irreps of the Lorentz group. For example, scalar quantities like the jet mass will necessarily be encoded in the scalars of the latent space, and jet and particle 4-momenta in the vectors.

We demonstrate the latter empirically on the LGAE-Mix model (\(\tau _{(1/2, 1/2)} = 2\)) by looking at correlations between jet 4-momenta and the components of different combinations of latent vector components. Figure 5 shows that, in fact, the jet momenta is encoded in the imaginary component of the sum of the latent vecotrs.

We can also attempt to understand the anomaly detection performance by looking at the encodings of the training data compared to the anomalous signal. Figure 6 shows the individual and total invariant mass of the latent vectors of sample LGAE models for QCD and top quark, W boson, and Z boson inputs. We observe that despite the overall similar kinematic properties of the different jet classes, the distributions for the QCD background are significantly different from the signals, indicating that the LGAE learns and encodes the difference in jet substructure – despite substructure observables such as jet mass not being direct inputs to the network – explaining the high performance in anomaly detection.

Finally, while in this section we showcased simple “brute-force” techniques for interpretability by looking directly at the distributions and correlations of latent features, we hypothesize that such an equivariant latent space would also lend itself effectively to the vast array of existing explainable AI algorithms [80, 81], which generically evaluate the contribution of different input and intermediate neuron features to network outputs. We leave a detailed study of this to future work.

4.6 Data efficiency

Fig. 7
figure 7

Median magnitude of relative errors of jet mass reconstruction by LGAE and CNNAE models at trained on different fractions of the training data

In principle, equivariant neural networks should require less training data for high performance, since critical biases of the data, which would otherwise have to be learnt by non-equivariant networks, are already built in. We test this claim by measuring the performances of the best-performing LGAE and CNNAE architectures from Sect. 4.3 trained on varying fractions of the training data.

The median magnitude of the relative errors between the reconstructed and true jet masses of the different models and fractions is shown in Fig. 7. Each model is trained five times per training fraction, with different random seeds, and evaluated on the same-sized validation dataset; the median of the five models is plotted. We observe that, in agreement with our hypothesis, the LGAE models both maintain their high performance all the way down to training on 1% of the data, while the CNNAE’s performance steadily degrades down to 2% and then experiences a further sharp drop.

5 Conclusion

We develop the Lorentz group autoencoder (LGAE), an autoencoder model equivariant to Lorentz transformations. We argue that incorporating this key inductive bias of high energy physics (HEP) data can have a significant impact on the performance, efficiency, and interpretability of machine learning models in HEP. We apply the LGAE to tasks of compression and reconstruction of input quantum chromodynamics (QCD) jets, and of identifying anomalous top quark, W boson, and Z boson jets. We report excellent performance in comparison to baseline graph and convolutional neural network autoencoder models, with the LGAE outperforming them on several key metrics. We also demonstrate the LGAE’s interpretability, by analyzing the latent spaces of LGAE models for both tasks, and data efficiency relative to baseline models. The LGAE opens many promising avenues in terms of both performance and model interpretability, with the exploration of new datasets, the magnitude of Lorentz and permutation symmetry breaking due to detector effects, higher-order Lorentz group representations, and challenges with real-life compression and anomaly detection applications all exciting possibilities for future work.