Abstract
In noisy intermediate-scale quantum era, the research on the combination of artificial intelligence and quantum computing has been greatly developed. Here we propose a quantum circuit-based algorithm to implement quantum residual neural networks, where the residual connection channels are constructed by introducing auxiliary qubits to data-encoding and trainable blocks in quantum neural networks. We prove that when this particular network architecture is applied to a l-layer data-encoding, the number of frequency generation forms extends from one, namely the difference of the sum of generator eigenvalues, to \({{{{{{{\mathcal{O}}}}}}}}({l}^{2})\), and the flexibility in adjusting Fourier coefficients can also be improved. It indicates that residual encoding can achieve better spectral richness and enhance the expressivity of various parameterized quantum circuits. Extensive numerical demonstrations in regression tasks and image classification are offered. Our work lays foundation for the complete quantum implementation of classical residual neural networks and offers a quantum feature map strategy for quantum machine learning.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.introduction
Quantum computing is a new computing paradigm based on quantum mechanics that utilizes qubits instead of classical bits to store and process information1. Since the theoretical concepts were proposed2,3,4, quantum computers have developed at an astonishing speed, gradually moving from the milestone achievement like quantum supremacy in the laboratory5,6,7 to the stage of proof-of-principle application exploration8,9,10. Among its many applications, quantum machine learning is an emerging field that leverages the power of quantum computers to overcome bottlenecks of high computing power requirements in the machine learning11,12,13,14. On the current noisy intermediate-scale quantum devices15, one popular strategy for constructing quantum machine learning algorithms is using classical-quantum hybrid optimization loops to train the parameterized quantum circuits for various learning tasks, such as pattern recognition16,17 and classification18,19,20,21,22.
Similar to the classical neural networks that consist of input layers, hidden layers and output layers, the fundamental structures of the variational quantum neural networks include data-encoding circuits, variational ansatz, and output layers realized by quantum measurement23,24. To be specific, the data-encoding or quantum feature map processes \({{{{{{{\mathcal{U}}}}}}}}(x)\) can map the classical data x ∈ χ to a quantum state in Hilbert space \({{{{{{{\mathcal{H}}}}}}}}\). It serves as one of the main sources of non-linearity for the networks, and there exist numerous encoding strategies such as amplitude encoding and angle encoding25. Moreover, different choices of architectures for the variational ansatz \({{{{{{{\mathcal{W}}}}}}}}(\theta )\) containing trainable parameters θ will lead to various quantum neural networks26,27,28,29,30,31,32,33,34,35,36 and it will greatly affect the network performance such as generalization37,38 and trainability39,40. For example, general deep parameterized quantum circuits suffer from the barren plateau phenomenon, leading to vanishing gradients40,41,42,43,44. But it can be avoided by networks with hierarchical structure, proposed as a realization of the quantum convolutional neural networks (QCNN)20,27,45, which has been proved the absence of barren plateaus46. Finally, the output of an n-qubit quantum neural networks is the mean value of a measurable observable O as
where initial state \(\left\vert {\psi }_{0}\right\rangle ={\left\vert 0\right\rangle }^{\otimes n}\) and Uθ(x) is the parameterized quantum circuit consisting repeatable data-encoding and trainable blocks. Interestingly, the expressivity and universality of such variational quantum models can be guaranteed by the fact that one can naturally write the outputs as partial Fourier series in the network inputs47,48,49,50, and the accessible frequencies are determined by the eigenvalues of the generator Hamiltonian in the data-encoding gates, while the coefficients are controlled by the design of the entire circuits50.
A great deal of research work has subsequently devoted to advancing the quantum neural networks, with one intuitive approach being the quantization of classical networks31,32,33,34. Especially, inspired by the classical residual neural networks, which are proposed for alleviating the vanishing gradient problem during the training process of deep neural networks51, its quantum counterpart is promising to mitigating barren plateaus34. The key idea is to introduce residual connections into the traditional neural networks, as shown in the Fig. 1. Mathematically, the residual connections can provide an additional cross-layer propagation channel for the input features, leading to a basic residual unit of neural networks as \({{{{{{{\mathcal{H}}}}}}}}(x)={{{{{{{\mathcal{F}}}}}}}}(x)+x\), where the non-linear parameterized function \({{{{{{{\mathcal{F}}}}}}}}(x)\) represents the traditional neural networks. Although there exist some works on the quantum realization of residual neural networks, the residual channels are usually implemented using classical or hybrid methods34,52. The researches on the full quantum implementations of residual connections and effects on the expressivity are still very lacking.
In this work, we address these issues by proposing a quantum circuit-based algorithm to implement quantum residual neural networks (QResNets). The residual connection channel is constructed through one ancillary qubit and the target evolution process is embedded in the subspace. Such structures are compatible to both the data-encoding and trainable blocks in the variational quantum neural networks. We also further parameterize the encoding gates on the auxiliary qubit and obtain the generalized residual operators. Furthermore, we find that the Fourier spectrum of the output of parameterized quantum circuits can be enriched when the residual connections are used for the data-encoding blocks. The number of frequency combinations forms can be extended from one, namely the difference of the sum of generator eigenvalues, to \({{{{{{{\mathcal{O}}}}}}}}({l}^{2})\) for the l-layer residual encoding. Moreover, the diverse construction methods for frequencies in the residual outputs and the extra trainable parameters in the generalized residual operators can expand the Fourier coefficient space. The results suggest that the expressivity of quantum models can be enhanced by residual connections. We offer extensive numerical demonstrations of the quantum algorithm in the regression tasks by function fitting of Fourier series, and also present the performance of binary classification with standard MNIST datasets to recognize the handwritten digits images53, achieving an accuracy improvement of over 7% with residual encoding. Our results show that the residual connections proposed in classical deep learning for improving trainability can also be used to improve the expressivity in quantum neural networks, making it a promising quantum learning model for real-life applications.
Results
Realization of quantum residual connection
In the QResNets, there are multiple layers of repeatable data-encoding block \({{{{{{{\mathcal{U}}}}}}}}(x)\) and trainable parameterized ansatz \({{{{{{{\mathcal{W}}}}}}}}(\theta )\), and the residual connections can be adopted in some of the blocks, as shown in the Fig. 1. The data-encoding block consists of quantum rotation gates of the form U(x) = eiHx where H is a generator Hamiltonian, while the trainable circuits are composed of single- and two-qubit parameterized quantum gates W(θ) with optimization parameters θ. Some gates in the data-encoding and ansatz block can be sampled to add residual connections forming quantum residual operators \({{{{{{{\mathcal{R}}}}}}}}(x)\) and \({{{{{{{\mathcal{R}}}}}}}}(\theta )\), which correspond to the residual evolution processes. We introduce a unified notation ♢ which has ♢ = x for quantum gates in the data-encoding blocks while ♢ = θ in the trainable blocks. Then for an n-qubit quantum system with initial state \(\left\vert {\phi }_{0}\right\rangle\), the evolution under residual operator can be expressed as
where σ0 is the identity matrix and \({{{{{{{\mathcal{L}}}}}}}}(x)=U(x)\) in the quantum feature map block and \({{{{{{{\mathcal{L}}}}}}}}(\theta )=W(\theta )\) in optimization ansatz. Such an evolution operator can be realized by the frame of linear combination of unitary with one ancillary qubit, and the target quantum states are obtained by post-processing54,55. Specifically, we first apply a Hadamard gate to encode the ancillary system followed by a controlled-\({{{{{{{\mathcal{L}}}}}}}} (\lozenge)\) operator. After adding another Hadamard gate, we can measure the ancillary qubit with results ma = 0/1 corresponding to quantum states \(\left\vert 0\right\rangle /\left\vert 1\right\rangle\). Then the evolution results under residual operators can be obtained in the \(\left\vert 0\right\rangle \left\langle 0\right\vert\) subspace. The introduction of an auxiliary qubit provides an additional channel that allows the unevolved quantum state to pass alone and add to the evolved quantum state.
More generally, the weight of the summation process can also be adjusted by replacing the first Hadamard gate on the ancillary qubit with Ry(2α) rotation with trainable angles α. Then the corresponding residual operator is generalized as a single optimization-angle residual operator
Such a construction does not require a post-selection process, but rather reconstructs the target operator from the measurement results. It can be reduced to \({{{{{{{\mathcal{R}}}}}}}}(\lozenge)\) with α = π/4 and ma = 0. Similarly, a two optimization-angles residual operator \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(\lozenge)\) can also be constructed by replacing both Hadamard gates with parameterized rotation gates, and the detail is shown in Methods section. In principle, the introduction of more trainable parameters in these two generalized residual operators will provide additional degrees of freedom for optimization, which can further increase the expressivity of the parameterized quantum circuits.
Therefore, we can conclude here that a general residual connection in quantum neural networks can be realized in the complete quantum circuit frame. It is also worth noting that in some special network structures such as the QCNN27, by reusing discarded qubits, we can simulate the residual connections without additional qubits. Moreover, due to the fact that the expressivity of quantum models is fundamentally limited by the data-encoding strategy, we will prove below that the residual connections applied to data-encoding block, no matter what ansatz used, will lead to a better spectra richness in the Fourier series of quantum model output, resulting an expressivity enhancement.
Frequency spectra enhancement
It has been pointed out that the output of a parameterized quantum circuit can be expressed as a finite-term Fourier series of the input features50
where the frequency ω of spectrum Ω = {wk − wj∣j, k ∈ [d]} depends on the d-dimensional generator of one-layer data-encoding gate U(x) = eiHx with eigenequations \(H\left\vert {h}_{j}\right\rangle ={w}_{j}\left\vert {h}_{j}\right\rangle\) for j ∈ [d]. Notation [d] ≔ {1, 2, ⋯ , d} here. It means that the accessible frequency of the quantum model is constructed from the difference between the generator eigenvalues. For example, a frequently used generator is the Pauli matrix H = σ/2 with two eigenvalues w1,2 = ± 1/2 where σ = {σx, σy, σz}, then such a one-layer data-encoding block would produce a frequency spectrum Ω = {0, ± 1}. Moreover, the expansion coefficients cω(θ, O) are associated with the entire structure of the quantum circuit, including trainable parameters θ, and the observable O.
However, for a data-encoding block with residual connection, more frequency components can be involved, realizing an improvement in the circuit approximation ability. Assuming that the initial quantum state \(\left\vert {\phi }_{0}\right\rangle\) of the residual encoding block is related to the optimization parameters θ, the residual outputs can be expressed as
It is clear that the first term produces the same frequency components as the traditional encoding scheme, whereas the second term corresponds to the zero-frequency component, independent of input feature x. So the key lies in the third term. Because the eigenstates \(\vert {h}_{j}\rangle\) of the generator Hamiltonian form a complete basis, we can then expand the initial quantum state \(\left\vert {\phi }_{0}\right\rangle\) and the observable O as \(\vert {\phi }_{0}\rangle ={\sum }_{k}{\phi }_{k}\vert {h}_{k}\rangle\) and \(O={\sum }_{j,k}{o}_{jk}\vert {h}_{j}\rangle \langle {h}_{k}\vert\). By using the equation \(U(x)\vert {h}_{j}\rangle ={e}^{i{w}_{j}x}\vert {h}_{j}\rangle\), we can have
It can be found that this part will produce new frequency components for the quantum models, which are the eigenfrequencies of generator themselves ± wk for k ∈ [d], but not the differences between them. Therefore, the new spectra of the one-layer data-encoding block with residual connection is
which indicates that the frequency generation forms of the quantum neural networks with residual encoding is more diverse, and the resulting Fourier spectrum in general could also be more abundant. In this case, the toy model we exemplified above will produce new spectrum {0, ± 1/2, ± 1}, which includes more frequency components and leads to an enhanced approximation ability for the parameterized quantum circuits.
A natural issue needs to be addressed is when will the residual encoding strategy behaves better than the traditional method. For the one-layer data-encoding block in quantum neural networks, it needs to meet the condition that there exists frequency component wk ∉ Ω for k ∈ [d], which implies
Such a constraint can be satisfied in many practical cases because we usually use Pauli operators as the generator Hamiltonian.
Furthermore, for the data-encoding strategy repeated l-times either in sequence or in parallel, the traditional scheme will lead to a frequency spectrum \({\Omega }_{l}=\{({w}_{{j}_{1}}+\cdots {w}_{{j}_{l}})-({w}_{{k}_{1}}+\cdots {w}_{{k}_{l}})| {j}_{1},\cdots \,,{j}_{l},{k}_{1},\cdots \,, {k}_{l}\in [d]\}\), which has only one frequency combination form, namely the difference between the sum of two sets of l frequencies50. However, for the residual encoding, there are more ways to construct the spectrum and the combination forms of frequencies will be more complex and diversified. Specifically, the frequency spectrum of a two-layer residual encoding is
which contains four kinds of frequency combination forms. More frequency generation forms in general can result in a larger upper limit for the spectrum size. We can summarize by induction that for a l-layer residual encoding scheme, the number of frequency combination forms is
where ⌈ ⋅ ⌉ and ⌊ ⋅ ⌋ represent roundup and rounddown functions. This is a squared improvement over the traditional scheme and detail is shown in the Methods section.
In addition to enlarging the accessible frequency spectrum, residual encoding can also improve the flexibility of the corresponding Fourier coefficients, both of which determine the expressivity of a quantum model. The enhancement comes from two aspects, one is due to the introduction of additional optimization degrees of freedom in the generalized residual operators \({{{{{{{{\mathcal{R}}}}}}}}}_{1,2}(x)\), and another one is due to the more diverse construction methods of frequency and the corresponding recombination of Fourier coefficients, which means that a single frequency component can be generated from the recombination of different terms in the residual outputs. The latter one is the reason why residual operator \({{{{{{{\mathcal{R}}}}}}}}(x)\) can behave better than the traditional encoding strategy in expanding Fourier coefficient space without introducing additional optimization parameters. Furthermore, we may be able to understand the frequency spectrum amplification in the quantum residual models from the perspective that the classical residual networks behave like ensembles of relatively shallow networks56. That is to say, the quantum residual connection channels can equivalently implement ensembles of small quantum models with different frequencies, thus leading to richer spectrum and stronger expressivity. We will show the expressivity improvement in detail in the numerical simulation section.
Measurement scheme
To get the expectation values of an observable O for the quantum state \({{{{{{{\mathcal{R}}}}}}}}(x)\left\vert {\phi }_{0}\right\rangle\), which is embedded in the \(\left\vert 0\right\rangle \left\langle 0\right\vert\) subspace of the ancillary qubit, we can introduce another observation operator \(\bar{O}=\left\vert 0\right\rangle \left\langle 0\right\vert \otimes O\) on the system. Then the output observation values can be expressed as
where \(\left\vert {\phi }_{f}\right\rangle =\left\vert 0\right\rangle {{{{{{{\mathcal{R}}}}}}}}(x)\left\vert {\phi }_{0}\right\rangle +\left\vert \perp \right\rangle\) is the output quantum state of the whole system, and the second item \(\left\vert \perp \right\rangle\) is orthogonal to the first part. Furthermore, because we can expand the measurement operator as \(\bar{O}=({\sigma }_{0}+{\sigma }_{z})/2\otimes O\), we can also have
This indicates that we can obtain the residual outputs fR(x, θ) by measuring the average expectation of system output state \(\vert {\phi }_{f}\rangle\) with two observations {σ0 ⊗ O, σz ⊗ O}, which is experimentally feasible and introduces little resource overhead. For a l-layer residual encoding, we need l ancillary qubits at most and the corresponding observation operators will be \(\{{({\sigma }_{0}+{\sigma }_{z})}^{\otimes l}\otimes O\}\), whose size grows exponentially with layers of residual encoding. This exponential dependence is intrinsically related to the attenuation of success probability in the quantum algorithms with post-selection. Specificly, suppose that the output state with one residual connection on qubit i is \(\vert {\phi }_{f}^{(i)}\rangle = \vert 0\rangle {{{{{{{\mathcal{R}}}}}}}}(\lozenge)\vert {\phi }_{0}^{(i)}\rangle +\vert \perp \rangle\), then the probability for measuring ancillary qubit in \(\left\vert 0\right\rangle\) state is \({P}_{0}^{(i)}=| | {{{{{{{\mathcal{R}}}}}}}}(\lozenge)\vert {\phi }_{0}^{(i)}\rangle | {| }^{2}\), where ∣∣x∣∣ represents the modulus of vector x. So the success probability of quantum algorithm with l residual connection blocks is \({P}_{s}={\prod }_{i = 1}^{l}{P}_{0}^{(i)}\), which decays exponentially with l.
In practice, we do not need to use residual feature maps in every block, and inserting residual connections to some sampled data-encoding blocks could make the networks obtain better expressivity. In addition, the measurement schemes suggest that our algorithm is compatible with the existing methods for calculating the gradient of expectation value of the quantum circuit with respect to the optimization parameters57,58,59. Using parameter-shift rule57, the gradient of the residual outputs for a parameter θj can be calculated as
where fR(x, θj ± π/2) are the expectation values when the target parameter θj is shifted by ± π/2 respectively.
Furthermore, it should be mentioned that the approximation improvement can be understood from the universal approximation property with polynomial basis functions60, which states that the linear combination of different observations can approximate any continuous functions. Based on the above analysis for the quantum models with the specific residual encoding structures, we can see that such a combination of measurement results can actually lead to a frequency richness improvement in the Fourier series, which enhances the expressivity ability of quantum neural networks. Therefore, our work can serve as a specific case to bridge the polynomial approximation60 and Fourier series approximation50, two perspectives for understanding the universal approximation property of quantum machine learning models.
Numerical demonstration
To demonstrate the improvement of the Fourier frequency spectrum by residual connections, we present a proof-of-principle numerical simulation with Pennylane61 here, which solves regression tasks of fitting quantum models to the target Fourier series. We adopt the traditional qubit encoding strategy to map classical data x into quantum state with a single-qubit Pauli-rotation \(U(x)={R}_{y}(x)={e}^{-ix{\sigma }_{y}/2}\) operator, where the generator Hamiltonian G = − σy/2 has two eigenvalues e1,2 = ± 1/2. The optimization ansatz used has two arbitrary single-qubit rotation gates \(U({\theta }_{i})={R}_{z}({\theta }_{i}^{1}){R}_{y}({\theta }_{i}^{2}){R}_{z}({\theta }_{i}^{3})\) for i = 1, 2 placed before and after the data-encoding block, resulting a quantum model Uθ(x) = U(θ2)U(x)U(θ1). The observable is σz and then the outputs is \(f(x,\theta )=\left\langle 0\right\vert {U}_{\theta }^{{{{\dagger}}} }(x){\sigma }_{z}{U}_{\theta }(x)\left\vert 0\right\rangle\). The quantum models are trained by a supervised learning frame to search the optimal parameters θ*, which minimizes the mean squared error (MSE) as
where D is the dimension of the data set and y( ⋅ ) is the target function. We use Adam optimizer with at most 200 steps and set the learning rate as 0.3 with batch size 0.7D in the simulation. A termination condition for the optimization convergence, that is, the variance of ten consecutive loss function values is less than 10−8, is also used.
As shown in the Fig. 2, this quantum model can learn functions of the form \({y}_{1}(x)={\sum }_{{\omega }_{i}\in {\Omega }_{1}}(a{e}^{i{\omega }_{i}x}+{a}^{* }{e}^{-i{\omega }_{i}x})\) with a MSE value Δ = 6.0 × 10−5, where a is an amplitude parameter and the frequency spectrum is Ω1 = {ω0 = 0, ω1 = 2∣e1,2∣ = 1}, and this is consistent to the results in50. However, a multi-frequency function with spectrum Ω2 = {ω0 = 0, ω1 = 1, ω2 = 0.5} cannot be well fitted with error Δ = 5.1 × 10−2, due to the frequency lack of parameterized quantum circuits caused by data-encoding strategy. The frequency mismatch can be mitigated by inserting residual connections to the data-encoding block with an output MSE value Δ = 5.1 × 10−5, because the resulting residual operator \({{{{{{{\mathcal{R}}}}}}}}(x)\) can bring richer frequency components to enhance the circuit expressivity. It is worth noting that the residual data encoding scheme still works well for the spectral Ω1 besides Ω2, and the optimization process can converge quickly.
Furthermore, we turn to a more general case for fitting the function \({y}_{2}(x)={\sum }_{{\omega }_{i}\in {\Omega }_{2}}({a}_{{\omega }_{i}}{e}^{i{\omega }_{i}x}+{a}_{{\omega }_{i}}^{* }{e}^{-i{\omega }_{i}x})\), where the amplitudes can be different for each frequency component. Additional degrees of freedom can be obtained from the multi-combination methods of single-frequency components in residual outputs and the parameterized gates on the auxiliary qubit in the generalized residual operators \({{{{{{{{\mathcal{R}}}}}}}}}_{1,2}(x)\) in equation (3) and (17). We can conclude from the numerical results in the Fig. 3 that the traditional encoding scheme still cannot fit the target function with MSE value Δ = 0.09, while the residual feature map with \({{{{{{{\mathcal{R}}}}}}}}(x)\) operator works better with error Δ = 2.1 × 10−3. When we use the generalized residual operators, the fitting results can be further improved, which converges to a smaller MSE values with Δ = 1.1 × 10−4 for \({{{{{{{{\mathcal{R}}}}}}}}}_{1}(x)\) and Δ = 1.7 × 10−4 for \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\) in fewer optimization steps with 77 steps for \({{{{{{{{\mathcal{R}}}}}}}}}_{1}(x)\) and 55 steps for \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\). Moreover, the extra combination forms and trainable parameterized quantum gates bring more flexibility for fitting, which expand the Fourier coefficient space. As shown in the Fig. 4, we sample the quantum models 1000 times with different feature maps which produce Fourier series, and then get the distribution of Fourier coefficients. We can see that under the same ansatz, the residual feature map with \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\) operator has the widest Fourier coefficients distribution, and all the three residual encoding are better than the traditional encoding scheme.
In addition, this enhancement can be quantitatively measured by a commonly used expressibility metric62. We first generate many pairs of parameters Θ1 and Θ2 randomly, and calculate the distribution (PF) of state fidelities \(F=| \left\langle 0\right\vert {U}_{{\Theta }_{1}}^{{{{\dagger}}} }(x){U}_{{\Theta }_{2}}(x)\left\vert 0\right\rangle {| }^{2}\), which measure the overlap of quantum states generated by quantum models. Then the Kullback-Leibler (KL) divergence63 is used to quantify the circuit expressivity by comparing the sampled fidelity distributions with that of the Haar-distributed state ensemble (PHaar) as
where the analytical form of the fidelity distribution for the ensemble of Haar random states is pHaar(F) = (N − 1)(1−F)N−2 and N is the dimension of Hilbert space64. A smaller KL divergence value corresponds to a more favorable expressibility. We sample each quantum model in the Fig. 4 by 1000 times and use 45 histogram bins to estimate the fidelity distribution, which are then compared with the sampled fidelities ensemble of the Haar random states. The computed results of KL divergence are \({D}_{KL}^{{{{{{{{\rm{trad}}}}}}}}}=0.0634,{D}_{KL}^{{{{{{{{\mathcal{R}}}}}}}}(x)}=0.0581,{D}_{KL}^{{{{{{{{{\mathcal{R}}}}}}}}}_{1}(x)}=0.0446\) and \({D}_{KL}^{{{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)}=0.0429\), respectively. We can see that the residual operators can indeed increase the circuit expressivity relative to traditional encoding scheme because they all can introduce richer frequency components into the quantum models. However, it worth mentioning that though the three residual models have the same frequency spectrum, the additional reasons for the expressivity enhancement are somewhat different for \({{{{{{{\mathcal{R}}}}}}}}(x)\) and \({{{{{{{{\mathcal{R}}}}}}}}}_{1,2}(x)\) operators. The former one is due to the diverse construction methods of frequencies in residual outputs, while the latter is also due to the additional optimization parameters. We prove in Methods section that the generalized residual outputs can be seen as the weighted version of the residual outputs with trainable weights. Moreover, it is known that constructing frequencies only from the difference between the sum of the generator’s eigenvalues will limit the access to higher-order components, resulting in a reduction in coefficient variance50. Therefore, the residual encoding method which can offer more methods to construct frequency could broaden the distribution of Fourier coefficients, which suggests an enhanced expressivity of quantum models by residual connections.
Moreover, similar to the traditional encoding, we can extend the accessible frequency spectrum by repeating the residual encoding block multi-times in sequence or in parallel method. To investigate the frequency extension by sequential and parallel repetitions of data-encoding, we fit the aforementioned target function y2(x) with a more complex spectra Ω3 = {ω0 = 0, ω1 = 1, ω2 = 0.5, ω3 = 1.5, ω4 = 2} and amplitude a0 = 0.1 and a1.5,2 = 5a1,0.5 = 0.15 + 0.15i. Two-layers of repeating structures for the traditional encoding in sequence and residual encoding with \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\) operators in sequence and in parallel are used, as shown in the Fig. 5. The single-qubit observable is O = σz for all cases. All the quantum models were trained with 200 steps at most using Adam optimizer and with batch size 16. We can see that both the sequential and parallel repetitions of residual encoding can extend the Fourier spectrum and fit the target function well. The MSE values and optimization steps for the sequential repetitions are Δ = 3.3 × 10−4 and 159 steps, while Δ = 4.2 × 10−4 and 115 steps for parallel repetitions. It should be clarified that the mixed use of residual and traditional encoding will also bring an enhanced expressivity. Therefore, replacing parts of the encoding blocks in complex quantum models with residual blocks, but not all of them, can enrich the expressivity of the whole neural networks.
Application in image classification
In this part, we turn to discuss the performance of QCNN algorithm with residual encoding for image classification using a real-word dataset MNIST. The MNIST includes 60000 (10000) images for train (test) datasets with 10 classes of handwritten digits, and each image is a 28 × 28 pixels data. Here we focus on the binary classification with selected classes 0 and 1, and the sizes for the train and test datasets used are 12665 and 2115. Constrained by the current quantum hardwares, high-dimensional data usually require classical pre-processing techniques for dimensionality reduction, and we adopt principal component analysis (PCA) technology to match the input data with the four-qubit data-encoding layer65. For comparison, we use qubit encoding and consider the case where no residual connection is added, and the case where the residual operator \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\) is applied to the i-th qubit, denoted as traditional and residual-Qi schemes, respectively.
The ansatz for QCNN algorithm is composed of a series of alternating convolutional and pooling layers27, as shown in the Fig. 6. Each convolutional layer includes several single- and two-qubit parameterized quantum gates, keeping a translationally invariant structure. We use Ising interactions between adjacent qubits with one parameter as \(ZZ(\phi )={e}^{-i{\sigma }_{z}\otimes {\sigma }_{z}\phi /2}\) and single-qubit U3 gates with three parameters as
The pooling layer is implemented by a parameterized controlled-U3 gate and one qubit will be traced out, reducing the quantum states from two qubits to a single qubit. We measure the expectation values \({\langle {\sigma }_{z}\rangle }_{i}\) on the output qubit for the i-th input data with label yi = 0/1. The cost function is \(C(\theta )={\sum }_{i = 1}^{D}{(| \langle {\sigma }_{z}\rangle {| }_{i}-{y}_{i})}^{2}/2D\) for a D-dimensional dataset and it is optimized by Adam optimizer with a learning rate 0.2. The number of iterations in the training process is 100 and the processes are repeated 20 times to obtain the mean values with random initialization of optimization parameters. Once the cost function converges and the optimal parameters \({\theta }^{* }=\arg {\min }_{\theta }C(\theta )\) are obtained, the measurement outputs can be reconstructed into binary values c0/1 via a boundary precision \(\epsilon \in \left(0,0.5\right]\). We suppose that the classification result is c0/1 = 1 for ∣〈σz〉∣ > 1 − ϵ and c0/1 = 0 for ∣〈σz〉∣ < ϵ, while other values are marked as unclassifiable optimization results. A smaller value for ϵ represents higher optimization accuracy and higher classification standards.
The optimization results of cost function and accuracy are shown in the Fig. 7 and Table 1. We set ϵ = 0.1 in the simulation and there are 20 free parameters involved in the ansatz. We can conclude that the residual encoding schemes can obtain smaller convergence values of loss than the traditional encoding method, which means that the models have better approximation ability. Such an enhancement can lead to better expressivity and higher accuracy for quantum models in complex learning tasks. In addition, the residual encoding can produce a high classification accuracy, reaching 92.85% and 92.47% on average for the train and test datasets respectively, which are about 7.74% and 7.57% higher than that with the traditional encoding strategy. Further, we provide more numerical simulations of larger QCNN models with up to 12 qubits in the Fig. 8. We can see that with the increase of the number of qubits, the dimensionality reduction of the input image is mitigated, and more information can be involved into the quantum networks. The convergence values of loss function is gradually reduced, and the learning accuracy is gradually improved. The average classification accuracy on the train and test datasets with a residual data-encoding algorithm can be improved to about 97.66% in the maximum-scale quantum learning model.
Conclusion
In summary, we have proposed a complete quantum circuit-based architecture for the implementation of quantum residual neural networks, dubbed QResNets. The classical residual connection channel is quantized by adding an auxiliary qubit to the data-encoding and trainable blocks, which is then generalized with additional parameterized gates. We further prove mathematically that the Fourier spectrum of quantum models output can be enriched when the residual connections are applied to the data-encoding blocks. There is a squared improvement in the number of frequency generation forms of residual encoding over the traditional schemes. It means that the l-layer residual encoding strategy can produce \({{{{{{{\mathcal{O}}}}}}}}({l}^{2})\) frequency combination methods, rather than just by the difference of sum of generator eigenvalues as in traditional methods. Moreover, the diverse spectrum construction methods in the residual outputs and additional optimization degrees of freedom in the generalized residual operators could make the Fourier coefficients more flexible, favoring the access to higher-order components. This indicates that the residual encoding can enrich the spectrum and broaden the Fourier coefficient distribution, that is, it can enhance the expressivity of various parameterized quantum circuits. Various numerical simulation of fitting the functions of Fourier series, and a demonstration of binary classification in images of handwritten digits with MNIST datasets are conducted to show the algorithm performance. Compared with the traditional encoding, the accuracy of residual encoding can be improved by about seven percent. Our work advances the design of quantum neural networks with specific structures and enables a full quantum realization of classical residual connections, and also provides a quantum feature map strategy.
Methods
Generalized residual operators
We have discussed the form of residual operator \({{{{{{{\mathcal{R}}}}}}}}(\lozenge)\) and its corresponding residual output fR(x, θ) above. In this part, we give a detail introduction to the generalized residual operators \({{{{{{{{\mathcal{R}}}}}}}}}_{1,2}(\lozenge)\) and the corresponding generalized residual outputs \({f}_{{R}_{1,2}}(x,\theta )\), which present stronger expressivity. As shown in equation (3) where one Hadamard gate is replaced by a parameterized gate, we further assume that both two Hadamard gates on the ancillary qubit are replaced by gates Ry(2α) and Ry(2γ) with trainable angles α and γ, then the \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(\lozenge)\) operator can be expressed as
with a relabeled angle η = πma/2 − γ. The residual operator \({{{{{{{{\mathcal{R}}}}}}}}}_{1}(\lozenge)\) can be seen as a special case with γ = − π/4 ignoring a global phase factor. When the generalized residual operator \({{{{{{{{\mathcal{R}}}}}}}}}_{1,2}(x)\) is used in the data-encoding block, the residual output is
where the trainable coefficients for \({{{{{{{{\mathcal{R}}}}}}}}}_{1}(x)\) operator are \({A}_{1}^{{R}_{1}}(\alpha )={\sin }^{2}\alpha /2,{A}_{2}^{{R}_{1}}(\alpha )={\cos }^{2}\alpha /2\) and \({A}_{3}^{{R}_{1}}(\alpha )={(-1)}^{{m}_{a}}\sin 2\alpha /2\), while for the \({{{{{{{{\mathcal{R}}}}}}}}}_{2}(x)\) operator are \({A}_{1}^{{R}_{2}}(\alpha ,\eta )={(\sin \alpha \sin \eta )}^{2},{A}_{2}^{{R}_{2}}(\alpha ,\eta )={(\cos \alpha \cos \eta )}^{2}\) and \({A}_{3}^{{R}_{2}}(\alpha ,\eta )=(\sin 2\alpha \sin 2\eta )/2\). Such extension offers additional degree of freedom for the optimization process and can relax the range of Fourier coefficients for the new frequency component wk in equation (6) to \({A}_{3}^{{R}_{1,2}}{\sum }_{j}{\phi }_{j}^{* }{o}_{jk}{\phi }_{k}\), and similar effect is true for other frequency components. In fact, the generalized residual outputs \({f}_{{R}_{1,2}}(x,\theta )\) can be seen as the weighted version of the residual outputs fR(x, θ), where the weights of each term are trainable.
Proof of frequency combination forms
As mentioned above, there are four kinds of combination forms for frequency generation with a two-layer residual encoding. When another residual encoding layer is added, the spectrum \({\Omega }_{l = 1}^{R}=\{{w}_{k}-{w}_{j},\pm {w}_{k}| j,k\in [d]\}\) would be combined to the spectrum \({\Omega }_{l = 2}^{R}\). We first consider the component of difference of the sum of generator eigenvalues, and it would bring new frequency components for the three-layer residual spectrum as
with index j1, j2, j3, k1, k2, k3 ∈ [d]. If we further consider the effect of eigenvalues \(\pm {w}_{k}\in {\Omega }_{l = 1}^{R}\), more frequency components can be involved as
We can combine the above cases for frequency generation and simply mark the combination forms of \(\pm (\mathop{\sum }_{m = 1}^{{l}_{1}\ge 1}{w}_{{j}_{m}}-\mathop{\sum }_{n = 1}^{{l}_{2}\ge 1}{w}_{{k}_{n}})\) as \({\mathbb{DS}}({l}_{1},{l}_{2})\), which means the difference between the sum of two sets with l1 and l2 frequencies. Note that we mark the combination form of \(\pm \mathop{\sum }_{m = 1}^{l\ge 1}{w}_{{j}_{m}}\) as \({\mathbb{DS}}(l,0)\). Then we can find that there are six kinds of frequency combination forms for the three-layer residual encoding, and it can be concluded as \(\{{\mathbb{DS}}(3,3),{\mathbb{DS}}(3,2),{\mathbb{DS}}(3,1),{\mathbb{DS}}(3,0),{\mathbb{DS}}(2,2),{\mathbb{DS}}(2,1)\}\). Further, for the l-layer residual encoding, the spectrum with various frequency generation forms can be formally expressed as
where the ⌈ ⋅ ⌉ and ⌊ ⋅ ⌋ are roundup and rounddown functions. Based on the number of items in each row of equation (21), we can determine the number of components in the set as
It can be concluded that compared with the traditional encoding method which generates frequency only with \({\mathbb{DS}}(l,l)\)50, there is a squared improvement in frequency generation methods for the residual encoding scheme with \({{{{{{{\mathcal{N}}}}}}}}({\Omega }_{l}^{R})\propto {{{{{{{\mathcal{O}}}}}}}}({l}^{2})\). While different combinations may produce some of the same frequency components, in general, more frequency-generation methods suggest that the possible upper bounds for the size of the Fourier spectrum of quantum model outputs can be larger, allowing for more complex learning tasks. Moreover, the diverse construction methods for frequencies can also improve the flexibility of Fourier coefficients, favoring the access to higher-order components and further improving the expressivity of quantum models.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The code is available from the corresponding authors upon reasonable request.
References
Nielsen, M. A. & Chuang, I. L.Quantum computation and quantum information (Cambridge university press, 2010). https://doi.org/10.1017/CBO9780511976667.
Feynman, R. P. Simulating physics with computers. Int J Theor Phys 21, 467–488 (1982).
Benioff, P. The computer as a physical system: A microscopic quantum mechanical hamiltonian model of computers as represented by turing machines. Journal of statistical physics 22, 563–591 (1980).
Deutsch, D. Quantum theory, the church–turing principle and the universal quantum computer. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences 400, 97–117 (1985).
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).
Zhong, H.-S. et al. Quantum computational advantage using photons. Science 370, 1460–1463 (2020).
Wu, Y. et al. Strong quantum computational advantage using a superconducting quantum processor. Physical review letters 127, 180501 (2021).
Cao, Y. et al. Quantum chemistry in the age of quantum computing. Chemical reviews 119, 10856–10915 (2019).
Cumming, R. & Thomas, T. Using a quantum computer to solve a real-world problem–what can be achieved today? arXiv preprint arXiv:2211.13080 (2022). https://doi.org/10.48550/arXiv.2211.13080.
Herman, D. et al. A survey of quantum computing for finance. arXiv preprint arXiv:2201.02773 (2022). https://doi.org/10.48550/arXiv.2201.02773.
Schuld, M., Sinayskiy, I. & Petruccione, F. An introduction to quantum machine learning. Contemporary Physics 56, 172–185 (2015).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Cerezo, M., Verdon, G., Huang, H.-Y., Cincio, L. & Coles, P. J. Challenges and opportunities in quantum machine learning. Nature Computational Science 2, 567–576 (2022).
Zeguendry, A., Jarir, Z. & Quafafou, M. Quantum machine learning: A review and case studies. Entropy 25, 287 (2023).
Preskill, J. Quantum computing in the nisq era and beyond. Quantum 2, 79 (2018).
Li, Y., Zhou, R.-G., Xu, R., Luo, J. & Hu, W. A quantum deep convolutional neural network for image recognition. Quantum Science and Technology 5, 044003 (2020).
Henderson, M., Shakya, S., Pradhan, S. & Cook, T. Quanvolutional neural networks: powering image recognition with quantum circuits. Quantum Machine Intelligence 2, 2 (2020).
Havlíček, V. et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209–212 (2019).
Farhi, E. & Neven, H. Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002 (2018). https://doi.org/10.48550/arXiv.1802.06002.
Hur, T., Kim, L. & Park, D. K. Quantum convolutional neural network for classical data classification. Quantum Machine Intelligence 4, 3 (2022).
Li, W. & Deng, D.-L. Recent advances for quantum classifiers. Science China Physics, Mechanics & Astronomy 65, 220301 (2022).
Ren, W. et al. Experimental quantum adversarial learning with programmable superconducting qubits. Nature Computational Science 2, 711–717 (2022).
Beer, K. et al. Training deep quantum neural networks. Nature communications 11, 808 (2020).
Abbas, A. et al. The power of quantum neural networks. Nature Computational Science 1, 403–409 (2021).
Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces. Physical review letters 122, 040504 (2019).
Dallaire-Demers, P.-L. & Killoran, N. Quantum generative adversarial networks. Physical Review A 98, 012324 (2018).
Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. Nature Physics 15, 1273–1278 (2019).
Chalumuri, A., Kune, R. & Manoj, B. A hybrid classical-quantum approach for multi-class classification. Quantum Information Processing 20, 119 (2021).
Wu, S. L. et al. Application of quantum machine learning using the quantum kernel algorithm on high energy physics analysis at the lhc. Physical Review Research 3, 033221 (2021).
Wang, H., Zhao, J., Wang, B. & Tong, L. A quantum approximate optimization algorithm with metalearning for maxcut problem and its simulation via tensorflow quantum. Mathematical Problems in Engineering 2021, 1–11 (2021).
Landman, J. et al. Quantum methods for neural networks and application to medical image classification. Quantum 6, 881 (2022).
Bausch, J. Recurrent quantum neural networks. Advances in neural information processing systems 33, 1368–1379 (2020).
Liu, Z., Shen, P.-X., Li, W., Duan, L.-M. & Deng, D.-L. Quantum capsule networks. Quantum Science and Technology 8, 015016 (2022).
Kashif, M. & Al-Kuwari, S. Resqnets: a residual approach for mitigating barren plateaus in quantum neural networks. EPJ Quantum Technology 11, 4 (2024).
Mangini, S., Tacchino, F., Gerace, D., Bajoni, D. & Macchiavello, C. Quantum computing models for artificial neural networks. Europhysics Letters 134, 10002 (2021).
Bowles, J., Ahmed, S. & Schuld, M. Better than classical? the subtle art of benchmarking quantum machine learning models. arXiv preprint arXiv:2403.07059 (2024). https://arxiv.org/abs/2403.07059.
Banchi, L., Pereira, J. & Pirandola, S. Generalization in quantum machine learning: A quantum information standpoint. PRX Quantum 2, 040321 (2021).
Friedrich, L. & Maziero, J. Quantum neural network cost function concentration dependency on the parametrization expressivity. Scientific Reports 13, 9978 (2023).
Anschuetz, E. R. & Kiani, B. T. Quantum variational algorithms are swamped with traps. Nature Communications 13, 7760 (2022).
McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nature communications 9, 4812 (2018).
Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nature communications 12, 1791 (2021).
Marrero, C. O., Kieferová, M. & Wiebe, N. Entanglement-induced barren plateaus. PRX Quantum 2, 040316 (2021).
Wang, S. et al. Noise-induced barren plateaus in variational quantum algorithms. Nature communications 12, 6961 (2021).
Ballarin, M., Mangini, S., Montangero, S., Macchiavello, C. & Mengoni, R. Entanglement entropy production in quantum neural networks. Quantum 7, 1023 (2023).
Herrmann, J. et al. Realizing quantum convolutional neural networks on a superconducting quantum processor to recognize quantum phases. Nature Communications 13, 4144 (2022).
Pesah, A. et al. Absence of barren plateaus in quantum convolutional neural networks. Physical Review X 11, 041011 (2021).
Gil Vidal, F. J. & Theis, D. O. Input redundancy for parameterized quantum circuits. Frontiers in Physics 8, 297 (2020).
Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E. & Latorre, J. I. Data re-uploading for a universal quantum classifier. Quantum 4, 226 (2020).
Caro, M. C., Gil-Fuster, E., Meyer, J. J., Eisert, J. & Sweke, R. Encoding-dependent generalization bounds for parametrized quantum circuits. Quantum 5, 582 (2021).
Schuld, M., Sweke, R. & Meyer, J. J. Effect of data encoding on the expressive power of variational quantum-machine-learning models. Physical Review A 103, 032430 (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90.
Shi, S. et al. Hybrid quantum-classical convolutional neural network for phytoplankton classification. Front. mar. sci. 10, 1158548 (2023)
Deng, L. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine 29, 141–142 (2012).
Gui-Lu, L. General quantum interference principle and duality computer. Communications in Theoretical Physics 45, 825 (2006).
Childs, A. M. & Wiebe, N. Hamiltonian simulation using linear combinations of unitary operations. arXiv preprint arXiv:1202.5822 (2012). https://doi.org/10.48550/arXiv.1202.5822.
Veit, A., Wilber, M. J. & Belongie, S. Residual networks behave like ensembles of relatively shallow networks. Advances in neural information processing systems 29 (2016). https://proceedings.neurips.cc/paper/2016/hash/37bc2f75bf1bcfe8450a1a41c200364c-Abstract.html.
Schuld, M., Bergholm, V., Gogolin, C., Izaac, J. & Killoran, N. Evaluating analytic gradients on quantum hardware. Physical Review A 99, 032331 (2019).
Mari, A., Bromley, T. R. & Killoran, N. Estimating the gradient and higher-order derivatives on quantum hardware. Physical Review A 103, 012405 (2021).
Wierichs, D., Izaac, J., Wang, C. & Lin, C. Y.-Y. General parameter-shift rules for quantum gradients. Quantum 6, 677 (2022).
Goto, T., Tran, Q. H. & Nakajima, K. Universal approximation property of quantum machine learning models in quantum-enhanced feature spaces. Physical Review Letters 127, 090506 (2021).
Bergholm, V. et al. Pennylane: Automatic differentiation of hybrid quantum-classical computations. arXiv preprint arXiv:1811.04968 (2018). https://doi.org/10.48550/arXiv.1811.04968.
Sim, S., Johnson, P. D. & Aspuru-Guzik, A. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Advanced Quantum Technologies 2, 1900070 (2019).
Kullback, S. & Leibler, R. A. On information and sufficiency. The annals of mathematical statistics 22, 79–86 (1951).
Życzkowski, K. & Sommers, H.-J. Average fidelity between random quantum states. Physical Review A 71, 032313 (2005).
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences 374, 20150202 (2016).
Acknowledgements
We acknowledge the support from the National Key R&D Plan (2021YFB2801800).
Author information
Authors and Affiliations
Contributions
J.W. conceived the algorithm; J.W., Z.H., D.C., and L.Q. contributed to writing and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Physics thanks Stefano Mangini and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wen, J., Huang, Z., Cai, D. et al. Enhancing the expressivity of quantum neural networks with residual connections. Commun Phys 7, 220 (2024). https://doi.org/10.1038/s42005-024-01719-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42005-024-01719-1
- Springer Nature Limited