Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The use of digital rather than analog instrumentation offers important practical advantages in the transmission of signals over long baselines, the implementation of compensating time delays, and the measurement of cross-correlation of signals. In digital delay circuits, the accuracy of the delay depends on the accuracy of the timing pulses in the system, and long delays accurate to tens of picoseconds are more easily achieved digitally than by using analog delay lines. Furthermore, there is no distortion of the signal by the digital units other than the calculable effects of quantization. In contrast, with an analog system, it is difficult to keep the shape of the frequency response within tolerances while delay elements are switched into and out of the signal channels. Correlators with wide dynamic range are readily implemented digitally, including those with multichannel output, as needed for spectral line observations. Analog multichannel correlators employ filter banks to divide the signal passband into many narrow channels. Such filters, when subject to temperature variations, can be a source of phase instability. Finally, except at the highest bit rates (frequencies), digital circuits need less adjustment than analog ones and are better suited to replication in large numbers for large arrays.

Digitization of the signal waveforms requires sampling of the voltages at periodic intervals and quantizing the sampled values so that each can be represented by a finite number of bits. The number of bits per sample is usually not large, especially in cases in which the signal bandwidth is large, requiring high sampling rates. However, coarse quantization results in a loss in sensitivity, since modification of the signal levels to the quantized values effectively results in the addition of a component of “quantization noise.” In most cases, this loss is small and is outweighed by the other advantages. In designing digital correlators, there are compromises to be made between sensitivity and complexity, and the number of quantization levels to use is an important consideration.

There are two ways to determine the spectrum of a random noise signal, as shown in Fig. 8.1. The autocorrelation function of the signal can be measured and then Fourier transformed into a power spectrum after a specified integration period. Alternately, the signal can be Fourier transformed first and the square modulus taken. In the first case, the resolution of the spectral estimate is approximately the reciprocal of the number of lags of the autocorrelation function calculated. In the direct Fourier transform route, the data stream must be segmented to control the spectral resolution, i.e., the resolution is approximately the reciprocal of the data segment length. The power spectra from all of the segments are summed over the integration period. To compare results between these methods, the number of lags in the correlator is set equal to the number of segment samples. For interferometry, the same two methods can be applied. The cross-correlation function can be calculated and Fourier transformed into a cross spectrum (called the XF technique), or the direct Fourier transform of one can be multiplied by the conjugate of the other to form the cross spectrum (the FX technique). These two methods are explored in detail in this chapter.

Fig. 8.1
figure 1

The relationship between two random processes, x(t) and y(t), of duration T, and their cross-correlation function, R x y (τ) and the cross spectrum, S x y (ν). If x(t) and y(t) are the same, R x x (τ) is the autocorrelation function, and S x x (ν) is the power spectrum. The spatial counterpart to this diagram is shown in Fig. 5.5

Digital signal processing in radio astronomy began in the early 1960s when Weinreb (1963) built a digital 64-channel autocorrelator that operated on the signal sampled at the Nyquist rate and quantized with one bit per sample.Footnote 1 At that time, the modern fast Fourier transform (FFT) algorithm (Cooley and Tukey 1965) was not known, although there are historical precedents in the mathematical literature going back to Gauss in the early nineteenth century. For the next two decades, virtually all spectrometers for single-dish and interferometric applications were based on the auto- or cross-correlation approach. By the 1990s and the advent of very large spectral processing systems (in terms of frequency channels and baselines), the advantages of the FX approach became apparent. All modern interferometers have spectral analysis capabilities, not only for observations of spectral lines but also for mitigation of the effects of radio frequency interference (RFI) and of instrumental bandwidth smearing.

8.1 Bivariate Gaussian Probability Distribution

The bivariate normal probability function is central to all signal analysis. If x and y are joint Gaussian random variables with zero mean and variance σ 2, the probability that one variable is between x and x + d x and, simultaneously, the other is between y and y + d y is p(x, y)d xd y, where

$$\displaystyle{ p(x,y) = \frac{1} {2\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\exp \left [\frac{-(x^{2} + y^{2} - 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right ]\;, }$$
(8.1)

and ρ is the correlation coefficient equal to \(\langle xy\rangle /\sqrt{\langle x^{2 } \rangle \langle y^{2}\rangle }\), where 〈 〉 denotes the expectation, which, with the usual assumption of ergodicity, is approximated by the average over many samples. The form of this function is shown in Fig. 8.2. Note that − 1 ≤ | ρ | ≤ 1. For | ρ | ≪ 1, the exponential can be expanded, giving

$$\displaystyle{ p(x,y) \simeq \left [ \frac{1} {\sigma \sqrt{2\pi }}\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )\right ]\left [ \frac{1} {\sigma \sqrt{2\pi }}\exp \left (\frac{-y^{2}} {2\sigma ^{2}} \right )\right ]\left (1 + \frac{\rho xy} {\sigma ^{2}} \right )\;, }$$
(8.2)

which for ρ = 0 is simply the product of two Gaussian functions. Equation  (8.1) can also be written as

$$\displaystyle{ p(x,y) = \frac{1} {\sigma \sqrt{2\pi }}\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right ) \frac{1} {\sigma \sqrt{2\pi (1 -\rho ^{2 } )}}\exp \left [\frac{-(y -\rho x)^{2}} {2\sigma ^{2}(1 -\rho ^{2})} \right ]\;. }$$
(8.3)

If this expression is integrated with respect to y from − to + , it reduces to a Gaussian function in x. As ρ approaches unity, Eq. (8.3) becomes the product of a Gaussian in x and a Gaussian in yx; the latter has a standard deviation \(\sigma \sqrt{1 -\rho ^{2}}\), which tends to zero as ρ approaches 1. Equations (8.1) and (8.2) will be used in examining the response of various types of samplers and correlators. For autocorrelators used with single antennas, the quantity to be measured is the autocorrelation function R(τ) = 〈v(t)v(tτ)〉, where v is the received signal. This case can be treated with x = v(t) and y = v(tτ).

Fig. 8.2
figure 2

Contours of equal probability density from the bivariate Gaussian distribution in Eq. (8.1). The contours are given by x 2 + y 2 − 2ρ x y =  const. For ρ = 0, they become circles; for ρ = 1, they merge into the line x = y; and for ρ = −1, they merge into x = −y.

8.2 Periodic Sampling

8.2.1 Nyquist Rate

If the signal is bandlimited, that is, its power spectrum is nonzero only within a finite band of frequencies, no information is lost in the sampling process as long as the sampling rate is high enough. This follows from the sampling theorem discussed in Sect. 5.2.1 Here, we sample a function of time and must avoid aliasing in the frequency domain. For a baseband (lowpass) rectangular spectrum with an upper cutoff frequency Δ ν, the width of the frequency spectrum, including negative frequencies, is 2Δ ν. The function is fully specified by samples spaced in time with an interval no greater than 1∕(2Δ ν), that is, a sampling frequency of 2Δ ν or greater. This critical sampling frequency, 2Δ ν, is called the Nyquist rate Footnote 2 for the waveform. For further discussion, see, for example, Bracewell (2000) or Oppenheim and Schafer (2009). In some digital systems in radio astronomy, the waveform that is digitized has a baseband spectrum and is sampled at the Nyquist rate. For a rectangular passband of this type, the autocorrelation function, which by the Wiener–Khinchin relation is the Fourier transform of the power spectrum, is

$$\displaystyle{ R_{\infty }(\tau ) = \frac{\sin (2\pi \varDelta \nu \,\tau )} {2\pi \varDelta \nu \,\tau } \;, }$$
(8.4)

where the subscript indicates unquantized sampling (that is, the accuracy is not limited by a finite number of quantization levels). Nyquist sampling can also be applied to bandpass spectra, and if the spectrum is nonzero only within a range of n Δ ν to (n + 1)Δ ν, where n is an integer, the Nyquist rate is again 2Δ ν. Thus, for sampling at the Nyquist rate, the lower and upper bounds of the spectral band must be integral multiples of the bandwidth. The autocorrelation function of a signal that has a flat spectrum over such a band is

$$\displaystyle{ R_{\infty }(\tau ) = \frac{\sin (\pi \varDelta \nu \,\tau )} {\pi \varDelta \nu \,\tau } \cos \left [2\pi \left (n + \frac{1} {2}\right )\varDelta \nu \,\tau \right ]\;. }$$
(8.5)

Zeros in this function occur at time intervals τ that are integral multiples of 1∕(2Δ ν). Therefore, for a rectangular passband, successive samples at the Nyquist rate are uncorrelated. Sampling at frequencies greater or less than the Nyquist rate is referred to as oversampling or undersampling, respectively. For any signal, adjusting the center frequency so that the spectrum conforms to the bandpass sampling requirement described above minimizes the sampling rate required to avoid aliasing.

8.2.2 Correlation of Sampled but Unquantized Waveforms

We now investigate the response of a hypothetical correlator for which the input signals are sampled at the Nyquist rate but are not quantized. It is necessary to consider only single-multiplier correlators since complex correlators can be implemented as combinations of them, as indicated in Fig. 6.3 The system under discussion can be visualized as one in which the samples either remain as analog voltages or are encoded with a sufficiently large number of bits that quantization errors are negligible. Since no information is lost in sampling, the signal-to-noise ratio of the correlation measurement may be expected to be the same as would be obtained by applying the waveforms without sampling to an analog correlator. There is probably no reason, in practice, to build a correlator for inputs with unquantized sampling. However, by comparing the results with those for quantized sampling, which we discuss later, the effects of quantization are more easily understood.

Two bandlimited waveforms, x(t) and y(t), are sampled at the Nyquist rate, and for each pair of samples, the multiplier within the correlator produces an output proportional to the product of the input amplitudes. The integrator allows the output to be averaged for any required time interval. Now the (normalized) cross-correlation coefficient of x(t) and y(t) for zero time delay between the two waveforms is

$$\displaystyle{ \rho = \frac{\langle x(t)y(t)\rangle } {\sqrt{\left \langle \left [x(t) \right ] ^{2 } \right \rangle \left \langle \left [y(t) \right ] ^{2}\right \rangle }}\;. }$$
(8.6)

(The cross-correlation coefficient ρ should not be confused with the autocorrelation function of x or y, R .) Since x and y have equal variance σ 2,

$$\displaystyle{ \langle x(t)y(t)\rangle =\rho \sigma ^{2}\;. }$$
(8.7)

The left side is the averaged product of the two waveforms and thus represents the correlator output. The output of the digital correlator after N N samples is

$$\displaystyle{ r_{\infty } = N_{N}^{-1}\sum _{ i=1}^{N_{N} }x_{i}y_{i}\;, }$$
(8.8)

where the subscript N denotes the Nyquist rate. Since the samples x i and y i obey the same Gaussian statistics as the continuous waveforms x(t) and y(t), we can clearly write

$$\displaystyle{ \langle r_{\infty }\rangle =\rho \sigma ^{2}\;. }$$
(8.9)

Thus, the output of the correlator is a linear measure of the correlation ρ. The variance of the correlator output is

$$\displaystyle{ \sigma _{\infty }^{2} =\langle r_{ \infty }^{2}\rangle -\langle r_{ \infty }\rangle ^{2}\;, }$$
(8.10)

and

$$\displaystyle\begin{array}{rcl} \langle r_{\infty }^{2}\rangle & =& N_{ N}^{-2}\sum _{ i=1}^{N_{N} }\sum _{k=1}^{N_{N} }\langle x_{i}y_{i}x_{k}y_{k}\rangle \\ & =& N_{N}^{-2}\sum _{ i=1}^{N_{N} }\langle x_{i}y_{i}\rangle ^{2} + N_{ N}^{-2}\sum _{ i=1}^{N_{N} }\sum _{k\neq i}\langle x_{i}y_{i}x_{k}y_{k}\rangle \;,{}\end{array}$$
(8.11)

where we have separated the terms for which i = k and ik. The first summation on the right side of Eq. (8.11) has a value of σ 4(1 + 2ρ 2)N N −1: from Eq. (8.3), it can be shown that

$$\displaystyle{ \int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }x^{2}y^{2}p(x,y)dx\,dy =\sigma ^{4}(1 + 2\rho ^{2})\;. }$$
(8.12)

The second summation term in Eq. (8.11) is readily evaluated by using the fourth-order moment relation in Eq. (6.36). Because successive samples of each signal are uncorrelated (a rectangular passband is assumed), 〈x i y i x k y k 〉 = 〈x i y i 〉〈x k y k 〉, and the second summation term has a value of (1 − N N −1)ρ 2 σ 4. Returning to Eq. (8.10), we can write

$$\displaystyle\begin{array}{rcl} \sigma _{\infty }^{2}& =& (1 + 2\rho ^{2})\sigma ^{4}N_{ N}^{-1} + (1 - N_{ N}^{-1})\rho ^{2}\sigma ^{4} -\rho ^{2}\sigma ^{4} \\ & =& \sigma ^{4}N_{ N}^{-1}(1 +\rho ^{2})\;. {}\end{array}$$
(8.13)

The signal-to-noise ratio with unquantized sampling is

$$\displaystyle{ \mathcal{R}_{\text{sn}\infty } = \frac{\langle r_{\infty }\rangle } {\sigma _{\infty }} = \frac{\rho \sqrt{N_{N}}} {\sqrt{(1 +\rho ^{2 } )}} \simeq \rho \sqrt{N_{N}}\;, }$$
(8.14)

where the approximation applies for ρ ≪ 1. Note that the condition ρ ≪ 1 is satisfactory in many practical circumstances. For the case in which ρ ≳0. 2, see Sect. 8.3.6. (The signal-to-noise ratio at the correlator output, which we are calculating here, is of interest mainly for weak signals.) For a measurement period τ, N N  = 2Δ ν τ, which is commonly 106–1012. From Eq. (8.14), the threshold of detectability of a signal is given by \(\rho \sqrt{ N_{N}} \simeq 1\), that is, ρ ≃ 10−3–10−6. In terms of the signal bandwidth and measurement duration, \(\mathcal{R}_{\text{sn}\infty } =\rho \sqrt{2\varDelta \nu \,\tau }\). Now for observations of a point source with identical antennas and receivers, ρ is equal to the ratio of the resulting antenna temperature to the system temperature, T A T S . Thus, the present result is equal to that given by Eq. (6.45) for an analog correlator with continuous unsampled inputs and T A  ≪ T S .

Before leaving the subject of unquantized sampling, we should consider the effect of sampling at rates other than the Nyquist rate. Successive sample values from any one signal are then no longer independent. We consider a sampling frequency that is β times the Nyquist rateFootnote 3 and a number of samples N = β N N . The sample interval is τ s  = (2β Δ ν)−1. Samples spaced by q τ s , where q is an integer, have a correlation coefficient that, from Eq. (8.4), is equal to

$$\displaystyle{ R_{\infty }(q\tau _{s}) = \frac{\sin (\pi q/\beta )} {\pi q/\beta } }$$
(8.15)

for a rectangular baseband response. Since the samples are not independent, we must reconsider the evaluation of the second summation term on the right side of Eq. (8.11). For those terms for which q =  | ik | is small enough that R (q τ s ) is significant, there will be an additional contribution given by

$$\displaystyle{ \left [\sigma ^{2}R_{ \infty }(q\tau _{s})\right ]^{2}\;. }$$
(8.16)

Now R 2 is very small for all but a very small fraction of the N(N − 1) terms in the second summation in Eq. (8.11). From Eq. (8.15), R 2, at its maxima, is equal to (βπ q)2 and for q = 103 is of order 10−6. However, as shown above, N is likely to be as high as 106–1012. Thus, in the second summation in Eq. (8.11), the contribution made by the terms for which the i and k samples are effectively independent remains essentially unchanged. The products for which R 2 is significant make an additional contribution equal to

$$\displaystyle{ 2\sigma ^{4}N^{-2}\sum _{ q=1}^{N-1}(N - q)R_{ \infty }^{2}(q\tau _{ s}) \simeq 2\sigma ^{4}N^{-1}\sum _{ q=1}^{\infty }R_{ \infty }^{2}(q\tau _{ s})\;. }$$
(8.17)

The variance of the correlator output now becomes

$$\displaystyle{ \sigma _{\infty }^{2} =\sigma ^{4}N^{-1}\left [1 + 2\sum _{ q=1}^{\infty }R_{ \infty }^{2}(q\tau _{ s})\right ]\;, }$$
(8.18)

and the signal-to-noise ratio of the correlation measurement is (see Appendix Appendix 8.1)

$$\displaystyle{ \mathcal{R}_{\text{sn}\infty } = \frac{\rho \sqrt{\beta N_{N}}} {\sqrt{1 + 2\sum _{q=1 }^{\infty }R_{\infty }^{2 }(q\tau _{s } )}}\;. }$$
(8.19)

Compare this result with Eq. (8.14) for Nyquist sampling. For values of β of \(\frac{1} {2}\), \(\frac{1} {3}\), \(\frac{1} {4}\), and so on, which correspond to undersampling, R  = 0 and the denominator in Eq. (8.19) is unity. The sensitivity thus drops as one would expect from the decrease in the data. For oversampling, β > 1, and the summation of R 2(q τ s ) in Eq. (8.19) is shown in Appendix Appendix 8.1 to be equal to (β − 1)∕2. The denominator in Eq. (8.19) is then equal to \(\sqrt{\beta }\), so the sensitivity is the same as that for sampling at the Nyquist rate. This is as expected, since in Nyquist sampling, no information is lost, and thus there is none to be gained by increased sampling. The result is different for quantized sampling, as will appear in the following sections.

8.3 Sampling with Quantization

In some sampling schemes, the signal is first quantized and then sampled, and in others, it is sampled and then quantized. Ideally, the end result is the same in either case, and in analyzing the process, we can choose the order that is most convenient. Suppose that a bandlimited signal is first quantized and then sampled. Quantization generates new frequency components in the signal waveform, so it is no longer bandlimited. If it is sampled at the Nyquist rate corresponding to the unquantized waveform, as is the usual practice, some information will be lost, and the sensitivity will be less than for unquantized sampling. Also, because quantization is a nonlinear operation, we cannot assume that the measured correlation of the quantized waveforms will be a linear function of ρ, which is what we want to measure. Thus, to utilize digital signal processing, there are three main points that should be investigated: (1) the relation between ρ and the measured correlation, (2) the loss in sensitivity, and (3) the extent to which oversampling can restore the lost sensitivity. Investigations of these points can be found in the work of Weinreb (1963), Cole (1968), Burns and Yao (1969), Cooper (1970), Hagen and Farley (1973), Bowers and Klingler (1974), Jenet and Anderson (1998), and Gwinn (2004).

Note that in discussing sampling with quantization, it is common practice to refer to Nyquist sampling when what is meant is sampling at the Nyquist rate for the unquantized waveform. We also follow this usage.

8.3.1 Two-Level Quantization

Sampling with two-level (one bit) quantization provided the earliest digital form of radio astronomy signals (Weinreb 1963). Although larger numbers of levels are now routinely used, this subsection is included as an introduction to the subject. The quantization characteristic for two-level sampling is shown in Fig. 8.3. The quantizing action senses only the sign of the instantaneous signal voltage. In many samplers, the signal voltage is first amplified and strongly clipped. The zero crossings are more sharply defined in the resulting waveform, and errors that might occur if the sampling time coincides with a sign reversal are thereby minimized.

Fig. 8.3
figure 3

Characteristic curve for two-level quantization. The abscissa is the input voltage x and the ordinate is the quantized output \(\hat{x}\).

The correlator for two-level signals consists of a multiplying circuit followed by a counter that sums the products of the input samples. The input signals are assigned values of +1 or −1 to indicate positive or negative signal voltages, and the products at the multiplier output thus take values of +1 or −1 for identical or different input values, respectively. We consider sampling both at the Nyquist rate and at multiples of it and represent by N the number of sample pairs fed to the correlator. The two-level correlation coefficient is

$$\displaystyle{ \rho _{2} = \frac{(N_{11} + N_{\bar{1}\bar{1}}) - (N_{\bar{1}1} + N_{1\bar{1}})} {N} \;, }$$
(8.20)

where N 11 is the number of products for which both samples have the value +1, \(N_{1\bar{1}}\) is the number of products in which the x sample has the value +1 and the y sample −1, and so on. The denominator in Eq. (8.20) is equal to the output that would occur if, for each sample pair, the signs of the signals were identical. ρ 2 can be related to the correlation coefficient ρ of the unquantized signals through the bivariate probability distribution Eq. (8.1), from which

$$\displaystyle{ P_{11} = \frac{N_{11}} {N} = \frac{1} {2\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\int _{0}^{\infty }\int _{ 0}^{\infty }\exp \left [\frac{-(x^{2} + y^{2} - 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right ]dx\,dy\;, }$$
(8.21)

where P 11 is the probability of the two unquantized signals being simultaneously greater than zero. The other required probabilities are obtained by changing the limits of the integrals in Eq. (8.21) as follows: 0 0 for \(P_{\bar{1}\bar{1}}\); 0 0 for \(P_{\bar{1}1}\); and 0 0 for \(P_{1\bar{1}}\). Note that \(P_{11} = P_{\bar{1}\bar{1}}\) and \(P_{1\bar{1}} = P_{\bar{1}1}\). Thus,

$$\displaystyle{ \rho _{2} = 2(P_{11} - P_{1\bar{1}})\;. }$$
(8.22)

The integral in Eq. (8.21) is evaluated in Appendix Appendix 8.2, from which we obtain

$$\displaystyle{ P_{11} = \frac{1} {4} + \frac{1} {2\pi }\sin ^{-1}\rho \;. }$$
(8.23)

Similarly,

$$\displaystyle{ P_{1\bar{1}} = \frac{1} {4} -\frac{1} {2\pi }\sin ^{-1}\rho \;, }$$
(8.24)

so

$$\displaystyle{ \rho _{2} = \frac{2} {\pi } \sin ^{-1}\rho \;. }$$
(8.25)

Equation (8.25), known as the Van Vleck relationship,Footnote 4 allows ρ to be obtained from the measured correlation ρ 2. For small values, ρ is proportional to ρ 2.

To determine the signal-to-noise ratio of the correlation measurement, we now calculate σ 2 2, the variance of the correlator output r 2:

$$\displaystyle{ \sigma _{2}^{2} =\langle r_{ 2}^{2}\rangle -\langle r_{ 2}\rangle ^{2}\;, }$$
(8.26)

where

$$\displaystyle{ r_{2} = N^{-1}\sum _{ i=1}^{N}\hat{x}_{ i}\hat{y}_{i}\;. }$$
(8.27)

In this chapter, the circumflex (\(\,\hat{\ }\,\)) is used to denote quantized signal waveforms. Since \(\rho _{2} =\langle \hat{ x}\hat{y}\rangle\), then from Eq. (8.27), 〈r 2〉 = ρ 2. Thus, r 2 is an unbiased estimator of ρ 2. The expression for 〈r 2 2〉 is equivalent to Eq. (8.11) for unquantized waveforms:

$$\displaystyle{ \langle r_{2}^{2}\rangle = N^{-2}\sum _{ i=1}^{N}\langle \hat{x}_{ i}^{2}\hat{y}_{ i}^{2}\rangle + N^{-2}\sum _{ i=1}^{N}\sum _{ k\neq i}\langle \hat{x}_{i}\hat{y}_{i}\hat{x}_{k}\hat{y}_{k}\rangle \;. }$$
(8.28)

The first summation term on the right side of Eq. (8.28) is equal to N −1 since the products \(\hat{x}_{i}\hat{y}_{i}\) take values of ± 1 for two-level sampling. In evaluating the second summation term, the situation is similar to that for unquantized sampling. The factor σ 4 in Eq. (8.17) is here replaced by the square of the variance of the quantized waveform, which is unity for two-level quantization. For all except a small fraction of the terms, q =  | ik | is large enough that samples i and k from the same waveform are uncorrelated. These terms make a total contribution closely equal to ρ 2 2. Those terms for which samples i and k are correlated make an additional contribution closely equal to

$$\displaystyle{ 2N^{-1}\sum _{ q=1}^{\infty }R_{ 2}^{2}(q\tau _{ s})\;, }$$
(8.29)

where R 2(τ) is the autocorrelation coefficient for a signal after two-level quantization. Thus,

$$\displaystyle\begin{array}{rcl} \sigma _{2}^{2} = N^{-1} + (1 - N^{-1})\rho _{ 2}^{2} + 2N^{-1}\sum _{ q=1}^{\infty }R_{ 2}^{2}(q\tau _{ s}) -\rho _{2}^{2}& &{}\end{array}$$
(8.30a)
$$\displaystyle\begin{array}{rcl} \simeq N^{-1}\left [1 + 2\sum _{ q=1}^{\infty }R_{ 2}^{2}(q\tau _{ s})\right ]\;,& &{}\end{array}$$
(8.30b)

where we have assumed that ρ 2 ≪ 1 and also that the term − N −1 ρ 2 2 can be neglected, since here we are mostly interested in signals near the threshold of detectability. Then the signal-to-noise ratio is

$$\displaystyle{ \mathcal{R}_{\mathrm{sn2}} = \frac{\langle r_{2}\rangle } {\sigma _{2}} = \frac{2\rho \sqrt{N}} {\pi \sqrt{1 + 2\sum _{q=1 }^{\infty }R_{2 }^{2 }(q\tau _{s } )}}\;. }$$
(8.31)

This ratio, relative to that for unquantized sampling at the Nyquist rate given by Eq. (8.14), defines an efficiency factor for the quantized correlation process:

$$\displaystyle{ \eta _{2} = \frac{\mathcal{R}_{\mathrm{sn2}}} {\mathcal{R}_{\text{sn}\infty }} = \frac{2\sqrt{\beta }} {\pi \sqrt{1 + 2\sum _{q=1 }^{\infty }R_{2 }^{2 }(q\tau _{s } )}}\;. }$$
(8.32)

Here, we have used N = β N N , so we are considering the same observing time as in the Nyquist-sampled case but sampling β times as rapidly. Note that τ s is correspondingly reduced. η 2 is one case of the general quantization efficiency factor, η Q (introduced in Sect. 6.2), where Q is the number of quantization levels.

Equation (8.25) gives the relationship between the correlation coefficients for a pair of signals before and after two-level quantization. This result includes the case of autocorrelation in which the two signals differ only because of a delay. Thus, we may write

$$\displaystyle{ R_{2}(q\tau _{s}) = \frac{2} {\pi } \sin ^{-1}\left [R_{ \infty }(q\tau _{s})\right ]\;. }$$
(8.33)

Equation (8.15) gives R (q τ s ) for a rectangular baseband signal spectrum sampled at β times the Nyquist rate, and Eq. (8.33) becomes

$$\displaystyle{ R_{2}(q\tau _{s}) = \frac{2} {\pi } \sin ^{-1}\left [\frac{\beta \sin (\pi q/\beta )} {\pi q} \right ]\;. }$$
(8.34)

R 2(q τ s ) thus has zeros at the same values of q τ s that R (q τ s ) does (the principal value is taken for the inverse sine function), and for β = 1, \(\frac{1} {2}\), \(\frac{1} {3}\), and so on, we obtain

$$\displaystyle{ \sum _{q=1}^{\infty }R_{ 2}^{2}(q\tau _{ s}) = 0\;. }$$
(8.35)

In these cases, the signal-to-noise ratio is a factor of 2∕π ( = 0. 637) times that for unquantized sampling at the same rate given in Eq. (8.15). For oversampling with β = 2 and β = 3, the corresponding signal-to-noise factors from Eqs. (8.32) and (8.34) are 0.744 and 0.773, respectively. Note, however, that the increased bit rate used in oversampling could produce a bigger increase in the signal-to-noise ratio if used to increase the number of quantization levels. Doubling the bit rate could be used to increase the number of levels to four, for which the signal-to-noise factor is 0.881 (as derived in Sect. 8.3.3). For a bit rate increase of three, the number of levels could be increased to eight, for which the signal-to-noise factor is 0.963. Note also that in the calculations given above, there is an implicit dependence on the bandpass shape of the signal through the assumption that ρ 2 ≪ 1 for samples for which i is not equal to k in Eq. (8.28). For β ≥ 2, a further dependence on the bandpass shape enters through the autocorrelation function R 2(q τ s ).

It has been mentioned that quantization generates additional spectral components. We can compare the power spectra of a signal before and after quantization since these spectra are the Fourier transforms of autocorrelation functions that are related by Eq. (8.25). Figure 8.4 shows the spectrum, after two-level quantization, of noise with an originally rectangular spectrum. A fraction of the original bandlimited spectrum is converted into a broad, low-level skirt that dies away very slowly with frequency.

Fig. 8.4
figure 4

Spectra of rectangular bandpass noise before and after two-level quantization. The unquantized spectrum is of lowpass form, as shown by the broken line. The spectrum after quantization is shown by the solid curve. The power levels of the two waveforms (represented by the areas under the curves) are equal, and the Fourier transforms of their spectra are related by Eq. (8.25).

8.3.2 Four-Level Quantization

The use of two digital bits to represent the amplitude of each sample results in less degradation of the signal-to-noise ratio than is obtained with one-bit quantization. Consideration of two-bit sampling leads naturally to four-level quantization, the performance of which has been investigated by several authors, notably Cooper (1970) and Hagen and Farley (1973). The quantization characteristic is shown in Fig. 8.5, where the quantization thresholds are − v 0, 0, and v 0. The four quantization states have designated values − n, −1, +1, and + n, where n, which is not necessarily an integer, can be chosen to optimize the performance. Products of two samples can take the value ± 1, ± n, or ± n 2. The four-level correlation coefficient ρ 4 can be specified by an expression similar to Eq. (8.20) for the two-level case, that is,

Fig. 8.5
figure 5

Characteristic curve for four-level quantization, with weighting factor n for outer levels. The abscissa is the unquantized voltage x, and the ordinate is the quantized output \(\hat{x}\). v 0 is the threshold voltage.

$$\displaystyle{ \rho _{4} = \frac{2n^{2}N_{nn} - 2n^{2}N_{n\bar{n}} + 4nN_{1n} - 4nN_{1\bar{n}} + 2N_{11} - 2N_{1\bar{1}}} {(2n^{2}N_{nn} + 2N_{11})_{\rho =1}} \;, }$$
(8.36)

where a bar on the subscript indicates a negative sign. The numerator is proportional to the correlator output and reduces to the form in the denominator for ρ = 1, that is, when the two input waveforms are identical. The numbers of the various level combinations can be derived from the corresponding joint probabilities. Thus, for example,

$$\displaystyle\begin{array}{rcl} N_{nn}& =& \,NP_{nn} \\ & =& \, \frac{N} {2\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\int _{v_{0}}^{\infty }\int _{ v_{0}}^{\infty }\exp \left [\frac{-(x^{2} + y^{2} - 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right ]dx\,dy\;,{}\end{array}$$
(8.37)

and, as in the two-level case, the other probabilities are obtained by using the appropriate limits for the integrals. For the case of ρ ≪ 1, the approximate form of the probability distribution in Eq. (8.2) simplifies the calculation.

Although ρ 4 can be evaluated from Eq. (8.36) in the above manner, an alternative derivation that provides a more rapid approach to the desired result is used here. This approach follows the treatment of Hagen and Farley (1973) and is based on a theorem by Price (1958). The form of the theorem that we require is

$$\displaystyle{ \frac{d\langle r_{4}\rangle } {d\rho } =\sigma ^{2}\left \langle \frac{\partial \hat{x}} {\partial x} \frac{\partial \hat{y}} {\partial y}\right \rangle \;, }$$
(8.38)

where r 4 is the unnormalized correlator output, and \(\hat{x}\) and \(\hat{y}\) are again the quantized versions of the input signals. For four-level sampling,

$$\displaystyle{ \frac{\partial \hat{x}} {\partial x} = (n - 1)\delta (x + v_{0}) + 2\delta (x) + (n - 1)\delta (x - v_{0})\;, }$$
(8.39)

where δ is the delta function, and a similar expression can be written for \(\partial \hat{y}/\partial y\). Equation (8.39) is the derivative of the function in Fig. 8.5. To determine the expectation of the product of the two derivatives on the right side of Eq. (8.38), the magnitudes of each of the nine terms in the product of the derivatives must be multiplied by the probability of occurrence. Thus, for example, the term (n − 1)2 δ(x + v 0)δ(y + v 0) has a magnitude of (n − 1)2 and probability

$$\displaystyle{ \frac{1} {2\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\exp \left [ \frac{-2v_{0}^{2}} {2\sigma ^{2}(1+\rho )}\right ]\;. }$$
(8.40)

By consolidating terms with equal probabilities, we obtain

$$\displaystyle\begin{array}{rcl} & & \frac{d\langle r_{4}\rangle } {d\rho } = \frac{1} {\pi \sqrt{1 -\rho ^{2}}}\Biggl \{(n - 1)^{2}\left [\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1+\rho )}\right ) +\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1-\rho )}\right )\right ] \\ & & \phantom{\frac{d\langle r_{4}\rangle } {d\rho } = \frac{1} {\pi \sqrt{1 -\rho ^{2}}}(n - 1)^{2}} + \left.4(n - 1)\exp \left ( \frac{-v_{0}^{2}} {2\sigma ^{2}(1 -\rho ^{2})}\right ) + 2\right \}\;,{}\end{array}$$
(8.41)

and

$$\displaystyle\begin{array}{rcl} & & \langle r_{4}\rangle = \frac{1} {\pi } \int _{0}^{\rho } \frac{1} {\sqrt{1 -\xi ^{2}}}\Biggl \{(n - 1)^{2}\left [\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1+\xi )}\right ) +\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1-\xi )}\right )\right ] \\ & & \phantom{\langle r_{4}\rangle = \frac{1} {\pi } \int _{0}^{\rho } \frac{1} {\sqrt{1 -\xi ^{2}}}(n - 1)^{2}} + \left.4(n - 1)\exp \left ( \frac{-v_{0}^{2}} {2\sigma ^{2}(1 -\xi ^{2})}\right ) + 2\right \}d\xi \;,{}\end{array}$$
(8.42)

where ξ is a dummy variable of integration. To obtain the correlation coefficient ρ 4, 〈r 4〉 must be divided by the expectation of the correlator output when the inputs are identical four-level waveforms, as in Eq. (8.36):

$$\displaystyle{ \rho _{4} = \frac{\langle r_{4}\rangle } {\varPhi +n^{2}(1-\varPhi )}\;, }$$
(8.43)

where Φ is the probability that the unquantized level lies between ± v 0, that is,

$$\displaystyle{ \varPhi = \frac{1} {\sigma \sqrt{2\pi }}\int _{-v_{0}}^{v_{0} }\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx = \text{erf}\left ( \frac{v_{0}} {\sigma \sqrt{2}}\right )\;. }$$
(8.44)

Equations (8.42)–(8.44) provide a relationship between ρ 4 and ρ that is equivalent to the Van Vleck relationship for two-level quantization.

The choice of values for n and v 0 is usually made to maximize the signal-to-noise ratio for weak signals, which we now derive. For ρ ≪ 1, Eqs. (8.42) and (8.43) reduce to

$$\displaystyle{ (\rho _{4})_{\rho \ll 1} =\rho \frac{2[(n - 1)E + 1]^{2}} {\pi [\varPhi +n^{2}(1-\varPhi )]} \;, }$$
(8.45)

where E = exp(−v 0 2∕2σ 2). The variance in the measurement of r 4 is

$$\displaystyle{ \sigma _{4}^{2} =\langle r_{ 4}^{2}\rangle -\langle r_{ 4}\rangle ^{2} =\langle r_{ 4}^{2}\rangle -\rho _{ 4}^{2}\left [\varPhi +n^{2}(1-\varPhi )\right ]^{2}\;. }$$
(8.46)

The factor [Φ + n 2(1 −Φ)] is the variance of the quantized waveform and here takes the place of σ 2 in the corresponding equations for unquantized sampling. Again, we follow the procedure explained for the unquantized case and write

$$\displaystyle{ \langle r_{4}^{2}\rangle = N^{-2}\sum _{ i=1}^{N}\langle \hat{x}_{ i}^{2}\hat{y}_{ i}^{2}\rangle + N^{-2}\sum _{ i=1}^{N}\sum _{ i\neq k}\langle \hat{x}_{i}\hat{y}_{i}\hat{x}_{k}\hat{y}_{k}\rangle \;. }$$
(8.47)

To evaluate the first summation, note that \((\hat{x}_{i}\hat{y}_{i})^{2}\) can take values of 1, n 2, or n 4, and the sum of these values multiplied by their probabilities is equal to [Φ + n 2(1 −Φ)]2. The contribution of the second summation is

$$\displaystyle{ (1 - N^{-1})\rho _{ 4}^{2}\left [\varPhi +n^{2}(1-\varPhi )\right ]^{2} + 2N^{-1}[\varPhi +n^{2}(1-\varPhi )]^{2}\sum _{ q=1}^{\infty }R_{ 4}^{2}(q\tau _{ s})\;, }$$
(8.48)

where the second term represents the effect of oversampling and is similar to Eq. (8.17), and R 4 is the autocorrelation function after four-level quantization. Thus, from Eq. (8.46), we have

$$\displaystyle{ \sigma _{4}^{2} = N^{-1}\left [\varPhi +n^{2}(1-\varPhi )\right ]^{2}\left [1 + 2\sum _{ q=1}^{\infty }R_{ 4}^{2}(q\tau _{ s}) -\rho _{4}^{2}\right ]\;. }$$
(8.49)

Since we have assumed ρ ≪ 1, the ρ 4 2 term can be neglected, and the signal-to-noise ratio for the four-level correlation measurement is

$$\displaystyle{ \mathcal{R}_{\mathrm{sn4}} = \frac{\langle r_{4}\rangle } {\sigma _{4}} = \frac{2\rho [(n - 1)E + 1]^{2}\sqrt{N}} {\pi \left [\varPhi +n^{2}(1-\varPhi )\right ]\sqrt{1 + 2\sum _{q=1 }^{\infty }R_{4 }^{2 }(q\tau _{s } )}}\;. }$$
(8.50)

The signal-to-noise ratio relative to that for unquantized Nyquist sampling is obtained from Eq. (8.14) for N = β N N and is

$$\displaystyle{ \eta _{4} = \frac{\mathcal{R}_{\mathrm{sn4}}} {\mathcal{R}_{\text{sn}\infty }} = \frac{2[(n - 1)E + 1]^{2}\sqrt{\beta }} {\pi [\varPhi +n^{2}(1-\varPhi )]\sqrt{1 + 2\sum _{q=1 }^{\infty }R_{4 }^{2 }(q\tau _{s } )}}\;. }$$
(8.51)

For sampling at the Nyquist rate, β = 1 and

$$\displaystyle{ \eta _{4} = \frac{\mathcal{R}_{\mathrm{sn4}}} {\mathcal{R}_{\text{sn}\infty }} = \frac{2[(n - 1)E + 1]^{2}} {\pi [\varPhi +n^{2}(1-\varPhi )]} \;. }$$
(8.52)

Values of η 4 very close to optimum sensitivity are obtained for n = 3 with v 0 = 0. 996σ, and for n = 4, with v 0 = 0. 942σ: see Table A8.1 in Appendix Appendix 8.3. Note that the choice of an integer for the value of n simplifies the correlator. For these two cases, η 4, the signal-to-noise ratio relative to that for unquantized sampling, is equal to 0.881 and 0.880, respectively. Curves of the relative sensitivity as a function of v 0σ for n = 2, 3, and 4 are shown in Fig. 8.6. Similar conclusions are derived by Hagen and Farley (1973) and Bowers and Klingler (1974).

Fig. 8.6
figure 6

Signal-to-noise ratio relative to that for unquantized correlation for the four-level system and several modifications of it. The abscissa is the quantization threshold v 0 in units of the rms level of the waveforms at the quantizer input. The ordinate is sensitivity (signal-to-noise ratio) relative to an unquantized system. The curves are for: (1) full four-level system with n = 2; (2) full four-level system with n = 3; (3) full four-level system with n = 4; (4) four-level system with n = 3 and low-level products omitted; (5) three-level system. From Cooper (1970). © CSIRO 1970. Published by CSIRO Publishing, Melbourne, Victoria, Australia. Reproduced with permission.

Having chosen values for n and v 0, we can now return to Eqs. (8.42) and (8.43) to examine the relationship of ρ and ρ 4. Curve 1 of Fig. 8.7 shows a plot of ρ and ρ 4. Extrapolation of a linear relationship with slope chosen to fit low values of ρ results in errors of 1% at ρ = 0. 5, 2% at ρ = 0. 7, and 2.8% at ρ = 0. 8, where the error is a percentage of the true value of ρ. Thus, for many purposes, a linear approximation is satisfactory for values of ρ up to ∼ 0. 6. This linearity assumption simplifies the final step that we require in discussing four-level sampling, namely, calculation of the improvement in sensitivity resulting from oversampling.

Fig. 8.7
figure 7

Correlation coefficient ρ for unquantized signals plotted as a function of the correlation that would be measured after quantization. The curves are for: (1) full four-level system with n = 3 and v 0 = σ, or n = 4 and v 0 = 0. 95σ; (2) four-level system with low-level products omitted, n = 4 and v 0 = 0. 9σ; (3) three-level system with v 0 = 0. 6σ. From Cooper (1970). © CSIRO 1970. Published by CSIRO Publishing, Melbourne, Victoria, Australia. Reproduced with permission.

The relationship between the autocorrelation function for unquantized noise R and that for the same waveform after four-level quantization is the same as for the corresponding cross-correlation functions in Eq. (8.45), so we can write

$$\displaystyle{ R_{4} = \frac{2[(n - 1)E + 1]^{2}R_{\infty }} {\pi [\varPhi +n^{2}(1-\varPhi )]} \;, }$$
(8.53)

provided that R 0. 6. Now R as given by Eq. (8.15) fulfills this condition for q = 1 with an oversampling factor β = 2. For n = 3 and the corresponding optimum value of v 0, E = 0. 6091, Φ = 0. 6806, and R 4 = 0. 881R . For β = 2, we use Eqs. (8.15) and (8.53) and Eq. (A8.5) of Appendix Appendix 8.1 to evaluate the summation in the denominator of Eq. (8.51), and obtain η 4 = 0. 935, which is a factor of 1.06 greater than for β = 1. Bowers and Klingler (1974) have pointed out that the optimum value of the quantization level v 0 changes slightly with the oversampling factor. However, the optimum values are rather broad (see Fig. 8.6), and the effect on the sensitivity is very small.

In a discussion of two-bit quantization, Cooper (1970) considered the effect of omitting certain products in the multiplication process. For example, if all products of the two low-level bits are counted as zero instead of ± 1, the loss in signal-to-noise ratio is approximately 1%, as shown in curve 4 of Fig. 8.6. The products to be accumulated are then only those counted as ± n and ± n 2 in the full four-level system described above, and in the modified system, they can be assigned values of ± 1 and ± n, respectively, thereby simplifying the counter circuitry of the integrator. An even greater simplification can be accomplished by omitting the intermediate-level products also and assigning values ± 1 to the high-level products. This last type of modification yields 92% of the sensitivity of a full four-level correlator. We shall not analyze the case where only the low-level products are omitted, but we note that to derive the correlation coefficient as a function of ρ, one can express the action of the correlator in terms of two different quantization characteristics (Hagen and Farley 1973) or else return to Eq. (8.36) and omit the appropriate terms. If both the low- and intermediate-level products are omitted, however, the action can be described more simply in terms of a new quantization characteristic, known as three-level quantization, without arbitrary omission of product terms.

8.3.3 Three-Level Quantization

Three-level quantization has proved to be an important practical technique, and the quantization characteristic is shown in Fig. 8.8. In this case, the approach using Price’s theorem will again be followed.

Fig. 8.8
figure 8

Characteristic curve for three-level quantization. The abscissa is the unquantized voltage x, and the ordinate is the quantized output \(\hat{x}\). v 0 is the threshold voltage. Since the magnitude of \(\hat{x}\) takes only one nonzero value, it is perfectly general to set this value to unity.

The expressions for the operating characteristics of a three-level correlator can be obtained from those in the preceding section by omitting the terms that refer to low- and intermediate-level products and adjusting the weighting factors as appropriate. Thus, the equivalent derivative needed in Price’s theorem is,

$$\displaystyle{ \frac{\partial \hat{x}} {\partial x} =\delta (x - v_{0}) +\delta (x + v_{0})\;, }$$
(8.54)

and the expectation of the correlator output 〈r 3〉 is, from Price’s theorem,

$$\displaystyle{ \langle r_{3}\rangle = \frac{1} {\pi } \int _{0}^{\rho } \frac{1} {\sqrt{1 -\xi ^{2}}}\left [\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1+\xi )}\right ) +\exp \left ( \frac{-v_{0}^{2}} {\sigma ^{2}(1-\xi )}\right )\right ]d\xi \;, }$$
(8.55)

where ξ is a dummy variable of integration. The normalized correlation coefficient is

$$\displaystyle{ \rho _{3} = \frac{\langle r_{3}\rangle } {1-\varPhi }\;, }$$
(8.56)

where Φ is given by Eq. (8.44). For ρ ≪ 1, Eqs. (8.55) and (8.56) yield

$$\displaystyle{ (\rho _{3})_{\rho \ll 1} =\rho \frac{2E^{2}} {\pi (1-\varPhi )}\;, }$$
(8.57)

where E is defined following Eq. (8.45). The variance of r 3 is

$$\displaystyle{ \sigma _{3}^{2} =\langle r_{ 3}^{2}\rangle -\langle r_{ 3}\rangle ^{2} = N^{-1}(1-\varPhi )^{2}\left [1 + 2\sum _{ q=1}^{\infty }R_{ 3}^{2}(q\tau _{ s}) -\rho _{3}^{2}\right ]\;, }$$
(8.58)

where R 3 is the autocorrelation coefficient after three-level quantization. If ρ 3 2 in Eq. (8.58) can be neglected, the signal-to-noise ratio relative to a nonquantizing correlator is

$$\displaystyle{ \eta _{3} = \frac{\mathcal{R}_{\mathrm{sn3}}} {\mathcal{R}_{\text{sn}\infty }} = \frac{\langle r_{3}\rangle } {\sigma _{3}\mathcal{R}_{\text{sn}\infty }} = \frac{2\sqrt{\beta }E^{2}} {\pi (1-\varPhi )\sqrt{1 + 2\sum _{q=1 }^{\infty }R_{3 }^{2 }(q\tau _{s } )}}\;. }$$
(8.59)

For Nyquist sampling, the maximum sensitivity relative to the nonquantizing case is obtained with v 0 = 0. 6120σ, for which η 3 is equal to 0.810 (see curve 5 of Fig. 8.6). With this optimized threshold value, Φ = 0. 4595, E = 0. 8292, and we can write R 3(q τ s ) = 0. 810R (q τ s ), assuming that ρ is an approximately linear function of r 3. Then from Eqs. (8.15), (8.59), and Eq. (A8.5), we find that for a rectangular baseband spectrum with the oversampling factor β = 2, η 3 becomes 0.890, which is a factor of 1.10 greater than for β = 1.

8.3.4 Quantization Efficiency: Simplified Analysis for Four or More Levels

For quantization into two, three, or four levels, the quantization efficiency, η Q , is 0.636, 0.810, and 0.881. For more quantization levels, the loss in efficiency resulting from the quantization decreases further, and an approximate method of calculating the loss (Thompson 1998) can be used, as follows. This is simpler than the more accurate method given in Sect. 8.3.3. In either case, the principle is to calculate the fractional increase in the variance of a signal that results from the quantization. The signal-to-noise ratio at the correlator output is inversely proportional to this variance.

Figure 8.9 shows a piecewise linear approximation of the Gaussian probability distribution of a signal from one antenna. This approximation simplifies the analysis. The intersections with the vertical lines indicate exact values of the Gaussian. For eight-level sampling, the quantization thresholds are indicated by the positions of the vertical lines between the numbers ± 3. 5 on the abscissa. The horizontal spacing between adjacent levels is represented by ε, in units of the (unquantized) rms voltage, σ, i.e., ε σ is the spacing between the levels in volts. We consider first the case in which the number of levels is even, as in Fig. 8.9. Any one sample that falls between the two consecutive thresholds at m ε σ and (m + 1)ε σ will be assigned a value \((m + \frac{1} {2})\epsilon \sigma\). The normalized trapezoidal probability distribution for the voltage in this segment of the overall probability distribution in Fig. 8.9 can be written as

$$\displaystyle{ p(v) = \frac{1} {\epsilon \sigma } + \left [v -\left (m + \frac{1} {2}\right )\epsilon \sigma \right ]\,\varDelta _{m}\qquad \qquad m\epsilon \sigma < v < (m + 1)\epsilon \sigma \;, }$$
(8.60)

where Δ m is the change in probability, over the voltage range m ε σ to (m + 1)ε σ. The extra variance that is incurred by quantizing the voltage is

Fig. 8.9
figure 9

Piecewise linear representation of the Gaussian probability distribution of the amplitude of a signal within the receiver. The intersections of the curve with the vertical lines denote exact values of the Gaussian. The abscissa is the signal amplitude (voltage) in units of ε σ, and the numbers indicate the values assigned to the levels after quantization. For eight-level sampling the quantization thresholds are indicated by the seven vertical lines that lie between − 3. 5ε σ and 3. 5ε σ on the abscissa. For signal levels outside the range ± 4ε σ, indicated by the shaded areas, the assigned values are ± 3. 5ε σ.

$$\displaystyle{ \left \langle \left [v - (m + \frac{1} {2})\epsilon \sigma \right ]^{2}\right \rangle =\int _{ m\epsilon \sigma }^{(m+1)\epsilon \sigma }\left [v -\left (m + \frac{1} {2}\right )\epsilon \sigma \right ]^{2}p(v)\,dv\;. }$$
(8.61)

If we make the substitution \(x = v - (m + \frac{1} {2})\epsilon \sigma\), the excess variance becomes

$$\displaystyle{ \int _{-\epsilon \sigma /2}^{\epsilon \sigma /2}x^{2}\left [\frac{1} {\epsilon \sigma } + x\varDelta _{m}\right ]\,dx\;, }$$
(8.62)

or

$$\displaystyle{ \frac{2} {\epsilon \sigma } \int _{0}^{\epsilon \sigma /2}x^{2}dx = \frac{1} {3}\left ( \frac{\epsilon \sigma } {2}\right )^{2}\;. }$$
(8.63)

Note that the Δ m factor does not appear in Eq. (8.63). Hence, the excess variance is the same for all voltage bins from − 4ε σ to 4ε σ. The fraction of the area under the Gaussian probability curve that lies between these levels is

$$\displaystyle{ \frac{1} {\sqrt{2\pi }\sigma }\int _{-4\epsilon \sigma }^{4\epsilon \sigma }e^{-x^{2}/2\sigma ^{2} }dx = \text{erf}\left ( \frac{4\epsilon } {\sqrt{2}}\right )\;. }$$
(8.64)

Thus, the variance resulting from quantization of the signal samples with amplitudes in the range ± 4ε σ is

$$\displaystyle{ \frac{1} {3}\left ( \frac{\epsilon \sigma } {2}\right )^{2}\text{ erf}\left ( \frac{4\epsilon } {\sqrt{2}}\right )\;. }$$
(8.65)

We shall assume that the quantization error is essentially uncorrelated with the unquantized signal. In the extreme case of two-level sampling, the quantization error is highly correlated with the unquantized signal, so the treatment used here would not apply. Consider, however, the case of multilevel quantization, as in Fig. 8.10. If the signal voltage is increased steadily, the quantization error decreases from a maximum at each quantization threshold to zero when the voltage is equal to the midpoint of two thresholds. At each threshold, the quantization error changes sign, and the cycle repeats. This behavior greatly reduces any correlation between the quantization error and the signal waveform.

Fig. 8.10
figure 10

Examples of quantization characteristics for (left diagram) an even number of levels (eight), and (right diagram) an odd number of levels (nine). Units on both axes are equal to ε. The abscissa is the analog (unquantized) voltage, and the ordinate is the quantized output. The dotted curves show the analog level minus the quantized level. Note that for even numbers of levels, the thresholds occur at integral values on the abscissa, whereas for odd numbers of levels, they occur at values that are an integer plus one-half.

It is also necessary to take account of the effect of counting all signals below − 4ε σ as level − 3. 5ε σ, and those above + 4ε σ as + 3. 5ε σ. To make an approximate estimate of this effect, we divide the range of signal level outside of ± 4ε σ into intervals of width ε σ. Consider, for example, the interval centered on 6. 5ε σ. The probability of the signal falling within this level is equal to the corresponding area under the curve, which for the piecewise linear approximation is

$$\displaystyle{ \frac{1} {2} \frac{\epsilon } {\sqrt{2\pi }}\left [e^{-(6\epsilon )^{2} /2} + e^{-(7\epsilon )^{2} /2}\right ]\;. }$$
(8.66)

The variance resulting from quantization of the signal within this range is closely approximated by [(6. 5 − 3. 5)ε σ]2, so the total variance of the quantization error for signals outside the range ± 4ε σ is

$$\displaystyle{ \frac{\epsilon ^{3}\sigma ^{2}} {\sqrt{2\pi }}\sum _{m=4}^{\infty }(m - 3)^{2}\left [e^{-m^{2}\epsilon ^{2}/2 } + e^{-(m+1)^{2}\epsilon ^{2}/2 }\right ]\;. }$$
(8.67)

In practice, the summation in (8.67) converges rapidly, and only a few terms are needed (i.e., those for m ε ≲3). The quantization error resulting from the truncation of the signal values outside the range ± 4ε σ clearly has some degree of correlation with the unquantized signal level. However, this is a small effect because the fraction of samples for which the signal lies outside ± 4ε σ is less than 1. 6% for eight-level quantization, with ε optimized for sensitivity. The percentage decreases as the number of quantization levels increases. We shall therefore treat the quantization error resulting from the truncation of the signal peaks as uncorrelated with the signal, but bear in mind that this assumption may introduce a small uncertainty into the calculation.

The variance of the quantized signal is equal to the variance of the unquantized signal (σ 2) plus the variance of the quantization errors in (8.65) and (8.67), that is,

$$\displaystyle\begin{array}{rcl} \sigma ^{2} + \frac{1} {3}\left ( \frac{\epsilon \sigma } {2}\right )^{2}\text{erf}\left ( \frac{4\epsilon } {\sqrt{2}}\right ) + \frac{\epsilon ^{3}\sigma ^{2}} {\sqrt{2\pi }}\sum _{m=4}^{\infty }(m - 3)^{2}\left [e^{-m^{2}\epsilon ^{2}/2 } + e^{-(m+1)^{2}\epsilon ^{2} /2}\right ]\;.& &{}\end{array}$$
(8.68)

If the variance is the same for both signals at the correlator input, and if the correlation of the signals is small (i.e., ρ ≪ 1), then the signal-to-noise ratio at the correlator output is inversely proportional to the variance. Thus, the quantization efficiency is

$$ \displaystyle\begin{array}{rcl} \eta _{(2\mathcal{N})}& =& \left \{1 + \frac{1} {3}\left ( \frac{\epsilon } {2}\right )^{2}\text{erf}\left ( \frac{\mathcal{N}\epsilon } {\sqrt{2}}\right )\right. \\ & +& \left. \frac{\epsilon ^{3}} {\sqrt{2\pi }}\sum _{m=\mathcal{N}}^{\infty }(m -\mathcal{N} + 1)^{2}\left [e^{-m^{2}\epsilon ^{2}/2 } + e^{-(m+1)^{2}\epsilon ^{2}/2 }\right ]\right \}^{-1}\;.{}\end{array}$$
(8.69)

Here, the equation has been generalized for \(2\mathcal{N}\) levels. For an odd number of levels, \(2\mathcal{N} + 1\), one of which is centered on zero signal level, the equivalent equation for the quantization efficiency is

$$\displaystyle\begin{array}{rcl} \eta _{(2\mathcal{N}+1)}& =& \left \{1 + \frac{1} {3}\left ( \frac{\epsilon } {2}\right )^{2}\text{erf}\left (\frac{(\mathcal{N} + \frac{1} {2})\epsilon } {\sqrt{2}} \right )\right. \\ & +& \left. \frac{\epsilon ^{3}} {\sqrt{2\pi }}\sum _{m=\mathcal{N}+1}^{\infty }(m -\mathcal{N})^{2}\left [e^{-\left (m-\frac{1} {2} \right )^{2}\epsilon ^{2}/2 } + e^{-\left (m+\frac{1} {2} \right )^{2}\epsilon ^{2}/2 }\right ]\right \}^{-1}\;.{}\end{array}$$
(8.70)

Results from Eqs. (8.69) and (8.70) are given in Table 8.1. The values of ε are those that maximize η Q . The fourth column of the table gives P, which is the fraction of samples for which the signal amplitude is greater than \(\pm \mathcal{N}\epsilon \sigma\) for an even number of levels or greater than \(\pm \left (\mathcal{N} + \frac{1} {2}\right )\epsilon \sigma\) for an odd number of levels. For eight levels, P is the fraction of signal samples that contribute to the variance in (8.67). The values of η Q calculated here are accurate to about 2% for Q = 4 and to 0.1% for Q = 8 and higher.

Table 8.1 Quantization efficiency and other factors for four or more levels

8.3.5 Quantization Efficiency: Full Analysis, Three or More Levels

This section presents a general analysis of quantized systems for three or more levels, [e.g., Thompson et al. (2007)]. Let x represent the voltage of the unquantized signal samples, which have a Gaussian probability distribution with variance σ 2. Let \(\hat{x}\) represent the quantized values of x. The difference \(x -\hat{ x}\) represents an inequality introduced by the quantization. The inequality contains a component that is correlated with x, and an uncorrelated component that behaves much like random noise. Consider the correlation coefficient between x and \(x' = x -\alpha \hat{ x}\), where α is a scaling factor. The correlation coefficient is

$$\displaystyle{ \frac{\langle xx'\rangle } {x_{\mathrm{rms}}x'_{\mathrm{rms}}} = \frac{\langle x^{2}\rangle -\alpha \langle x\hat{x}\rangle } {x_{\mathrm{rms}}x'_{\mathrm{rms}}}\;. }$$
(8.71)

Here, the angle brackets 〈 〉 indicate the mean value. If \(\alpha =\langle x^{2}\rangle /\langle x\hat{x}\rangle\), then the correlation coefficient is zero, and x′ represents purely random noise. We refer to this random component as the quantization noise, equal to \(x -\alpha _{1}\hat{x}\), where \(\alpha _{1} =\langle x^{2}\rangle /\langle x\hat{x}\rangle\). Without loss of generality, we take σ 2 = 〈x 2〉 = 1 in this analysis and use \(\alpha _{1} = 1/\langle x\hat{x}\rangle\). The variance of the quantization noise is

$$\displaystyle{ \langle q^{2}\rangle =\langle (x -\alpha _{ 1}\hat{x})^{2}\rangle =\langle x^{2}\rangle - 2\alpha _{ 1}\langle x\hat{x}\rangle +\alpha _{ 1}^{2}\langle \hat{x}^{2}\rangle =\alpha _{ 1}^{2}\langle \hat{x}^{2}\rangle - 1\;. }$$
(8.72)

The total variance of the digitized signal is 1 + 〈q 2〉, and the quantization efficiency η Q is equal to the variance of the unquantized signal expressed as a fraction of the total variance. Thus,

$$\displaystyle{ \eta _{Q} = \frac{1} {(1 +\langle q^{2}\rangle )} = \frac{1} {\alpha _{1}^{2}\langle \hat{x}^{2}\rangle } = \frac{\langle x\hat{x}\rangle ^{2}} {\langle \hat{x}^{2}\rangle } \;. }$$
(8.73)

Consider the case for an even number of equally spaced levels, as in the eight-level case in Fig. 8.10. When the number of levels is even, it is convenient to define \(\mathcal{N}\) as half the number of levels. We first determine \(\langle x\hat{x}\rangle\). Note that for each sample value, x and \(\hat{x}\) have the same sign, so \(x\hat{x}\) is always positive. Let ε represent the spacing between adjacent quantization levels. The values of x that fall within the quantization level between m ε and (m + 1)ε are assigned values \(\hat{x} = (m + \frac{1} {2})\epsilon\), and their contribution to \(\langle x\hat{x}\rangle\) is

$$\displaystyle{ \frac{1} {\sqrt{2\pi }}\int _{m\epsilon }^{(m+1)\epsilon }(m + \frac{1} {2})\epsilon \,x\,e^{-x^{2} /2}\,dx\,. }$$
(8.74)

The contribution from the level between − m ε and − (m + 1)ε is the same as the expression above, so to obtain \(\langle x\hat{x}\rangle\), we sum the integrals for the positive levels and include a factor of two:

$$\displaystyle\begin{array}{rcl} \langle x\hat{x}\rangle & =& \sqrt{\frac{2} {\pi }} \;\left [\left (\sum _{m=0}^{\mathcal{N}-2}\int _{ m\epsilon }^{(m+1)\epsilon }\left (m + \frac{1} {2}\right )\epsilon \,x\,e^{-x^{2} /2}\,dx\right )\right. \\ & & \left.+\int _{(\mathcal{N}-1)\epsilon }^{\infty }\left (\mathcal{N}-\frac{1} {2}\right )\epsilon \,x\,e^{-x^{2} /2}\,dx\right ]\;. {}\end{array}$$
(8.75)

The summation term contains one integral for each positive quantization level except the highest one. The integral on the lower line covers the highest level and the range of x above it, for both of which the assigned value is \(\hat{x} = (\mathcal{N}-\frac{1} {2})\epsilon\). Then, performing the integration, Eq. (8.75) reduces to

$$\displaystyle{ \langle x\hat{x}\rangle = \sqrt{\frac{2} {\pi }} \,\epsilon \;\left (\frac{1} {2} +\sum _{ m=1}^{\mathcal{N}-1}e^{-m^{2}\epsilon ^{2}/2 }\right )\;. }$$
(8.76)

To evaluate the variance of \(\hat{x}\), again consider first the contribution from values of x that fall between m ε and (m + 1)ε. For this level, the quantized data \(\hat{x}\) all have the value \((m + \frac{1} {2})\epsilon\). The variance of \(\hat{x}\) for all values of x within this level is

$$\displaystyle{ \left (m + \frac{1} {2}\right )^{2}\,\epsilon ^{2} \frac{1} {\sqrt{2\pi }}\int _{m\epsilon }^{(m+1)\epsilon }e^{-x^{2}/2 }\,dx\;. }$$
(8.77)

For negative x, we again include a factor of 2, sum over all positive quantization levels except the highest, and add a term for the highest level and the range of x above it. Thus, the total variance of \(\hat{x}\) is:

$$\displaystyle\begin{array}{rcl} \langle \hat{x}^{2}\rangle & =& \sqrt{\frac{2} {\pi }} \left [\left (\sum _{m=0}^{\mathcal{N}-2}(m + \frac{1} {2})^{2}\,\epsilon ^{2}\int _{ m\epsilon }^{(m+1)\epsilon }e^{-x^{2}/2 }\,dx\right )\right. \\ & +& \left.\left (\mathcal{N}-\frac{1} {2}\right )^{2}\,\epsilon ^{2}\int _{ (\mathcal{N}-1)\,\epsilon }^{\infty }e^{-x^{2}/2 }\,dx\right ]\;. {}\end{array}$$
(8.78)

The integrals in Eq. (8.78) can be represented by error functions. Then, using Eqs. (8.73), (8.76), and (8.78), we obtain

$$\displaystyle{ \eta _{(2\mathcal{N})} = \frac{\frac{2} {\pi } \left (\frac{1} {2} +\sum _{ m=1}^{\mathcal{N}-1}e^{-m^{2}\epsilon ^{2}/2 }\right )^{2}} {\left (\mathcal{N}-\frac{1} {2}\right )^{2} - 2\,\sum _{ m=1}^{\mathcal{N}-1}m\ \text{erf}\left ( \frac{m\epsilon } {\sqrt{2}}\right )}\;. }$$
(8.79)

For the cases in which the number of levels is odd, the thresholds of the levels occur at values that are an integer plus \(\frac{1} {2}\), as in the nine-level case in Fig. 8.10. We represent the odd level number by \(2\mathcal{N} + 1\). Consider the values of x that fall within the quantization level between \((m -\frac{1} {2})\epsilon\) and \((m + \frac{1} {2})\epsilon\). These are assigned the value m ε, i.e., zero for the level centered on x = 0. For this level, the contribution to \(\langle x\hat{x}\rangle\) is

$$\displaystyle{ \frac{1} {\sqrt{2\pi }}\int _{(m-\frac{1} {2} )\epsilon }^{(m+\frac{1} {2} )\epsilon }m\epsilon \,x\,e^{-x^{2} /2}\,dx\;. }$$
(8.80)

Summing over all levels, as in Eq. (8.75), we obtain

$$\displaystyle\begin{array}{rcl} \langle x\hat{x}\rangle = \sqrt{\frac{2} {\pi }} \,\left [\,\left (\sum _{m=1}^{\mathcal{N}-1}\int _{ (m-\frac{1} {2} )\epsilon }^{(m+\frac{1} {2} )\epsilon }\!\!m\,\epsilon \,x\,e^{-x^{2} /2}\,dx\right ) +\int _{ (\mathcal{N}-\frac{1} {2} )\epsilon }^{\infty }\!\!\mathcal{N}\epsilon \,x\,e^{-x^{2} /2}\,dx\,\right ]\;.& &{}\end{array}$$
(8.81)

Then, as in Eq. (8.78), we determine \(\langle \hat{x}^{2}\rangle\):

$$\displaystyle\begin{array}{rcl} \langle \hat{x}^{2}\rangle = \sqrt{\frac{2} {\pi }} \,\left [\,\sum _{m=1}^{(\mathcal{N}-1)}\left (\int _{ (m-\frac{1} {2} )\epsilon }^{(m+\frac{1} {2} )\epsilon }(m\,\epsilon )^{2}\,e^{-x^{2}/2 }\,dx\right ) +\int _{ (\mathcal{N}-\frac{1} {2} )\epsilon }^{\infty }\!\!(\mathcal{N}\,\epsilon )^{2}\,e^{-x^{2}/2 }\,dx\,\right ]\;.& &{}\end{array}$$
(8.82)

Performing the integration in Eqs. (8.81) and (8.82), from (8.73), we obtain

$$\displaystyle{ \eta _{(2\mathcal{N}+1)} = \frac{\frac{2} {\pi } \left (\sum _{m=1}^{\mathcal{N}}e^{-\left (m-\frac{1} {2} \right )^{2}\epsilon ^{2}/2 }\right )^{2}} {\mathcal{N}^{2} - 2\sum _{ m=1}^{\mathcal{N}}\left (m -\frac{1} {2}\right )\,\text{erf}\left (\frac{\left (m -\frac{1} {2}\right )\epsilon } {\sqrt{2}} \right )}\;. }$$
(8.83)

Equations (8.79) and (8.83) can easily be evaluated numerically and provide values of quantization efficiency for any number of equally spaced levels.

Since no significant approximations were made, the same method can be used for cases in which the number of quantization levels is small and consequently the quantization noise is relatively large. Values of η Q for two, three, and four levels can be obtained by considering the effect of the quantization noise at a correlator input, following the method used above. In cases such as that in Appendix Appendix 8.3, for which the assigned values for the levels are chosen to optimize η Q , or for which the spacing between the level thresholds is not uniform, the formulas derived here cannot be applied directly. However, the same general approach of considering the spacings between levels can be used. For three-level quantization, the levels for maximum quantization efficiency are ± 0. 612σ (ε = 1. 224). Then we have

$$\displaystyle{ \langle x\hat{x}\rangle = \sqrt{\frac{2} {\pi }} \,\epsilon \int _{\epsilon /2}^{\infty }x\,e^{-x^{2}/2 }\,dx\;, }$$
(8.84)
$$\displaystyle{ \langle \hat{x}^{2}\rangle = \sqrt{\frac{2} {\pi }} \,\epsilon ^{2}\int _{ \epsilon /2}^{\infty }e^{-x^{2}/2 }dx\;, }$$
(8.85)

and

$$\displaystyle{ \eta _{3} = \frac{\langle x\hat{x}\rangle ^{2}} {\langle \hat{x}^{2}\rangle } = \frac{\sqrt{\frac{2} {\pi }} e^{-0.612^{2} /2}} {1 -\text{erf}\left (\frac{0.612} {\sqrt{2}} \right )} \;. }$$
(8.86)

Examples of results derived using Eqs. (8.79), (8.83), and (8.86) are shown in Table 8.2. In each case, the value of ε is chosen to maximize η Q . The values of η Q are given to five decimal places to show how they approach 1.0 as the number of levels increases. However, this is for the case of an ideal rectangular passband, which in a practical receiving system may be closely approximated. Figure 8.11 shows the quantization efficiency η Q as a function of the threshold spacing ε.

Fig. 8.11
figure 11

Quantization efficiency as a function of the threshold spacing, ε, in units equal to the rms amplitude, σ. The curves are for 64-level (solid line), 16-level (long-dashed line), 9-level (short-dashed line), and 4-level (long-and-short-dashed line). As ε becomes very small, the output of the quantizer depends mainly on the sign of the input, so the curves meet the ordinate axis at the two-level value of η Q  = 2∕π. As ε increases, more of the higher (positive and negative) levels contain only values in the extended tails of the Gaussian distribution, so the number of levels that make a significant contribution to the output decreases, and the curves merge together. The curves for even-level numbers move asymptotically to the two-level value, and curves for odd-level numbers move toward zero. The working point in each case is chosen to be near the maximum of the curve.

Table 8.2 Examples of quantization efficiency, η Q , for sampling at the Nyquist rate

If the constant voltage spacing between adjacent thresholds for both input and output values is not maintained, the individual levels can sometimes be adjusted to obtain an improvement in η Q of a few tenths of a percent, decreasing with increasing number of levels. The values of η Q in Table 8.2 are in agreement with results by Jenet and Anderson (1998), who give detailed calculations of performance for two- to eight-bit quantization, for both uniform and nonuniform threshold spacing. See also Appendix Appendix 8.3 for optimization in the case of four-level quantization.

In recent designs of radio telescopes, the level increment ε is frequently chosen so that signals at levels much higher than the rms system noise can be accommodated within the range of levels of the quantizer. This preserves an essentially linear response to interfering signals so that they can be eliminated or mitigated by further processing. For example, with 256 levels (8-bit representation) and ε = 0. 5, we find that η Q  = 0. 9796. The range of ± 128 levels then corresponds to ± 64σ, i.e., ± 36 dB above the system noise, for a ∼ 2% sacrifice in signal-to-noise ratio.

8.3.6 Correlation Estimates for Strong Sources

The efficiency calculations of the previous sections are based on estimates of the correlation from the averaged signal products before or after quantization, 〈x i y i 〉 or \(\langle \hat{x}_{i}\hat{y}_{i}\rangle\), in the limit of small correlation, | ρ | ≪ 1. Johnson et al. (2013) show that when the correlation is small ( | ρ | ≪ 1) and the signal variances are known (as is assumed when setting sampler thresholds), averaged products \(\langle \hat{x}_{i}\hat{y}_{i}\rangle\) do provide optimal estimates of the correlation. That is, when the correlation is small, no combination of the quantized signals will produce an unbiased estimate of correlation that has smaller variance than that of the correlator output, if suitable weights are chosen. This result arises from the form of the bivariate Gaussian distribution in this limit, which can be written such that the factor including the correlation coefficient ρ includes only terms of the form x y [see, e.g., Eq. (8.2)].

However, when the correlation is large, alternative estimates of the correlation will have lower noise. Thus, in the high correlation regime, it is necessary to revise our expression for quantization efficiency. For instance, when the signals are not quantized, the optimal estimate of correlation for two zero-mean signals is Pearson’s correlation coefficient [e.g., Wall and Jenkins (2012)],

$$\displaystyle{ r_{p} = \frac{\sum _{i=1}^{N_{N} }(x_{i} -\bar{ x})(y_{i} -\bar{ y})} {\sqrt{\sum _{i=1 }^{N_{N } }(x_{i } -\bar{ x})^{2}}\sqrt{\sum _{i=1 }^{N_{N } }(y_{i } -\bar{ y})^{2}}}\;, }$$
(8.87)

where, \(\bar{x} = \frac{1} {N_{N}}\sum _{i=1}^{N_{N}}x_{i}\) is the sample mean, and the sums in the denominator are proportional to the sample variances. The standard error in the estimate of r p , i.e., σ p , is

$$\displaystyle{ \sigma _{p} = N_{N}^{-1/2}(1 -\rho ^{2})\;. }$$
(8.88)

As ρ approaches unity, σ p goes to zero. In this limit, the probability function p(x, y) given in Eq. (8.1) collapses to a one-dimensional Gaussian distribution along the line x = y (see Fig. 8.2). When ρ = 1, that line is perfectly defined by a set of measurements of x i and y i , i.e., there are no deviations from the line, and the uncertainty in the estimate of ρ is zero. Perhaps, surprisingly, the estimate of correlation made without the sample means and variances, as in Eq. (8.8), i.e.,

$$\displaystyle{ r_{\infty } = \frac{1} {N_{N}}\sum _{i=1}^{N_{N} }x_{i}y_{i}\;, }$$
(8.89)

has an error when normalized by σ 2 of

$$\displaystyle{ \sigma _{\infty } = N_{N}^{-1/2}(1 +\rho ^{2})^{1/2} }$$
(8.90)

[see Eq. (8.13)], which equals σ p only for ρ = 0. For two-level quantization, for a rectangular passband and β = 1, the error on the correlation estimate is [see Eq. (8.30a) and Eq. (8.35)]

$$\displaystyle{ \sigma _{2} = N_{N}^{-1/2}(1 -\rho ^{2})^{1/2}\;. }$$
(8.91)

In the case of large correlation, the Van Vleck relation [Eq. (8.25)]

$$\displaystyle{ \rho =\sin \frac{\pi } {2}\rho _{2}, }$$
(8.92)

will require a nonlinear scaling of the error in ρ 2, denoted σ 2V , which can be written

$$\displaystyle{ \sigma _{2V } = N_{N}^{-1/2}\left [\left ( \frac{\pi } {2}\right )^{2} - (\sin ^{-1}\rho )^{2}\right ](1 -\rho ^{2})^{1/2}\;. }$$
(8.93)

These errors in correlation for the various cases described above [σ p , σ , and σ 2V as well as the error for the case of four-level sampling, derived from formulas by Gwinn (2004)] are shown in Fig. 8.12. The interesting result is that the performance of the two-level correlator is better than that of the unquantized correlator for ρ > 0. 6 and approaches that of the Pearson estimator as ρ approaches unity. The peculiarity in the two-level scenario was noted by Cole (1968) and is related to the fact that the sample variance is irrelevant in the two-level quantization estimate.

Fig. 8.12
figure 12

The correlation noise factor, N N 1∕2 times the standard deviation, vs. correlation ρ. The correlation factors for ρ = 0 are equal to n Q −1. The curves labeled unquantized and two-level quantization give correlation factors based on the standard signal-product estimate of ρ [see Eqs. (8.90) and (8.93) for unquantized and two-level quantization, respectively]. The factor for the Pearson’s r curve, given in Eq. (8.88), is based on the estimator r p , which involves the sample mean and variance. Adapted from Johnson et al. (2013).

Johnson et al. (2013) derive maximum likelihood estimators (MLEs) for the unsampled case in which the signal variance is known. Its standard deviation, σ q , falls slightly below σ p . The authors also derive MLEs for various quantization levels and show that their performance σ q (Q) approaches σ q for large values of Q.

8.4 Further Effects of Quantization

Various forms of analysis in radio astronomy involve cross-correlation of signals from different antennas or autocorrelation of a signal as a function of time. The values of the correlation of quantized signals deviate from the true correlation of the unquantized signals to an extent that is most serious for two-level sampling, and the deviation decreases as the number of levels is increased. Correction for this effect requires determination of how the cross-correlation of the quantized data, here designated R, is related to the true cross-correlation, ρ. To examine the effect of quantization, we consider the effect of a time offset τ on two Gaussian waveforms that are otherwise identical. In the case of two-level sampling, the required relationship is given by the Van Vleck equation [Eq. (8.25)] and is

$$\displaystyle{ R_{2}(\tau ) = \frac{2} {\pi } \sin ^{-1}\rho (\tau )\;. }$$
(8.94)

For more than two quantization levels, the relationship is more complicated, and although the nonlinearity of the quantized correlation becomes less serious with an increasing number of levels, correction may still be necessary. As very large instruments come into operation, it becomes increasingly important to remove the responses to strong radio sources in order to study the fainter emission from the most distant regions of the Universe. This requires very accurate calibration of the received signal strengths.

8.4.1 Correlation Coefficient for Quantized Data

Let x and y represent two Gaussianly distributed data streams that differ only by a time offset, τ. The correlation coefficient, ρ(τ), is equal to 〈x y〉∕〈x 2〉. The quantized values of x and y are identified by circumflex accents, i.e., \(\hat{x}\) and \(\hat{y}\). The correlation coefficient of the quantized variables is

$$\displaystyle{ R(\tau ) =\langle \hat{ x}\hat{y}\rangle /\langle \hat{x}^{2}\rangle \;. }$$
(8.95)

To determine ρ(τ) as a function of the correlation coefficient ρ, we need to consider the probabilities of occurrence of the unquantized variables x and y within each quantization interval. First, consider the case in which the number of quantization intervals is even and equal to \(2\mathcal{N}\). Thus, there are \(\mathcal{N}\) positive intervals plus \(\mathcal{N}\) negative ones. The mean value of the products of pairs of the quantized values, \(\langle \hat{x}\hat{y}\rangle\), is obtained by considering each of the \(2\mathcal{N}\times 2\mathcal{N} = 4\mathcal{N}^{2}\) possible pairings of the levels of \(\hat{x}\) and \(\hat{y}\). Only half of these need be calculated, since if the x and y values are interchanged, the probability remains the same. The probability of the unquantized variables x and y falling within any pair of intervals is given by integration of the Gaussian bivariate probability distribution, Eq. (8.1), over the corresponding range of x and y. In Eq. (8.1), x and y have variance σ and cross-correlation coefficient ρ. Here, we are concerned with samples of x and y taken at the Nyquist interval τ s , and n is the number of Nyquist intervals between the pairs of samples considered. For a rectangular passband of width Δ ν, the correlation coefficient is given by

$$\displaystyle{ \rho (n\tau _{s}) = \frac{\sin (\pi n\tau _{s})} {\pi n\tau _{s}} \;. }$$
(8.96)

To calculate \(\langle \hat{x}\hat{y}\rangle\) for each combination of two quantization intervals, the joint probability of the required unquantized variables falling within these intervals is multiplied by the product of the corresponding values assigned in the quantization process. These results are then summed for all the pairs of intervals. Since the probability distributions of \(\hat{x}\) and \(\hat{y}\) are both symmetrical about zero, first consider the case in which both of these variables are positive and run from zero to \(\mathcal{N}\). As noted above, we take the step size to be unity. Let L(i) be the series of \(\mathcal{N} + 1\) values that define the positive quantization steps, i.e., \(0,1,2,\ldots,(\mathcal{N}- 1),\ldots,\infty \). Thus, for i = 1 to \(\mathcal{N}\), L(i) = i − 1, and \(L(\mathcal{N} + 1) = \infty \). For y, there is an identical series of levels represented as L( j). Then the component of \(\langle \hat{x}\hat{y}\rangle\) that results from the positive ranges of x and y is

$$\displaystyle{ \sum _{i=1}^{\mathcal{N}}(i - 1/2)\left [\sum _{ j=1}^{\mathcal{N}}(\,j - 1/2)\int _{ L(\,j)}^{L(\,j+1)}\int _{ L(i)}^{L(i+1)}p(x,y)\,dx\,dy\right ], }$$
(8.97)

where (i − 1∕2) and ( j − 1∕2) are the values of the digital data assigned to the corresponding quantization intervals, and p(x, y) is the Gaussian bivariate probability distribution, Eq. (8.1). The case in which both x and y are negative provides an equal component of \(\langle \hat{x}\hat{y}\rangle\). Thus, the component of \(\langle \hat{x}\hat{y}\rangle\) for cases in which x and y have the same sign is

$$\displaystyle\begin{array}{rcl} \langle \hat{x}\hat{y}\rangle & =& \frac{1} {\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\sum _{i=1}^{\mathcal{N}}(i - 1/2) \\ & & \left \{\sum _{j=1}^{\mathcal{N}}(\,j - 1/2)\int _{ L(\,j)}^{L(\,j+1)}\int _{ L(i)}^{L(i+1)}\left [\exp \left (\frac{-(x^{2} + y^{2} - 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right )\right ]dx\,dy\right \}\;.{}\end{array}$$
(8.98)

For the cases in which x and y have opposite signs, one of either (i − 1∕2) or ( j − 1∕2) is negative, and the sign of either x or y within the exponential function in Eq. (8.98) is negative. When the corresponding expression is included (with negative sign since the component of \(\langle \hat{x}\hat{y}\rangle\) is negative), we obtain

$$\displaystyle\begin{array}{rcl} \langle \hat{x}\hat{y}\rangle & =& \frac{1} {\pi \sigma ^{2}\sqrt{1 -\rho ^{2}}}\sum _{i=1}^{\mathcal{N}}(i - 1/2) \\ & & \left \{\sum _{j=1}^{\mathcal{N}}(\,j - 1/2)\int _{ L(\,j)}^{L(\,j+1)}\int _{ L(i)}^{L(i+1)}\left [\exp \left (\frac{-(x^{2} + y^{2} - 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right )\right.\right. \\ & & \left.\left.-\exp \left (\frac{-(x^{2} + y^{2} + 2\rho xy)} {2\sigma ^{2}(1 -\rho ^{2})} \right )\right ]dx\;dy\right \}\;. {}\end{array}$$
(8.99)

Equation (8.99) shows how \(\langle \hat{x}\hat{y}\rangle\) is derived using the usual form of the bivariate distribution in Eq. (8.1). An equivalent form of the probability distribution of x and y in Eq. (8.1) is given by Abramowitz and Stegun (1968, see Eqs. 26.2.1 and 26.3.2), which avoids the explicit use of the double integrals. Equation (8.99) can then be written as follows:

$$\displaystyle\begin{array}{rcl} \langle \hat{x}\hat{y}\rangle & =& \frac{1} {\sqrt{2\pi }\sigma } \\ & & \ \left \{\sum _{i=1}^{\mathcal{N}}\left [(i - 1/2)\sum _{ j=1}^{\mathcal{N}}\left [(\,j - 1/2)\int _{ L(i)}^{L(i+1)}\mathrm{erfc}\left ( \frac{L(\,j) -\rho x} {\sigma \sqrt{2(1 -\rho ^{2 } )}}\right )\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx\right ]\right ]\right. \\ & & -\sum _{i=1}^{\mathcal{N}}\left [(i - 1/2)\sum _{ j=1}^{\mathcal{N}}\left [(\,j - 1/2)\int _{ L(i)}^{L(i+1)}\mathrm{erfc}\left (\frac{(L(\,j + 1) -\rho x} {\sigma \sqrt{2(1 -\rho ^{2 } )}} \right )\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx\right ]\right ] \\ & & -\sum _{i=1}^{\mathcal{N}}\left [(i - 1/2)\sum _{ j=1}^{\mathcal{N}}\left [(\,j - 1/2)\int _{ L(i)}^{L(i+1)}\mathrm{erfc}\left ( \frac{L(\,j) +\rho x} {\sigma \sqrt{2(1 -\rho ^{2 } )}}\right )\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx\right ]\right ] \\ & & \left.+\sum _{i=1}^{\mathcal{N}}\left [(i - 1/2)\sum _{ j=1}^{\mathcal{N}}\left [(\,j - 1/2)\int _{ L(i)}^{L(i+1)}\mathrm{erfc}\left (\frac{(L(\,j + 1) +\rho x} {\sigma \sqrt{2(1 -\rho ^{2 } )}} \right )\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx\right ]\right ]\right \}\,, \\ & & {}\end{array}$$
(8.100)

where erfc is the complementary error function (1 −erf).

To calculate \(\hat{x}^{2}\), the Gaussian probability function for a single variable is used, taking double the expression for the positive range of x:

$$\displaystyle\begin{array}{rcl} \hat{x}^{2}& =& \frac{\sqrt{2}} {\sqrt{\pi }\sigma } \sum _{i=1}^{\mathcal{N}}(i - 1/2)^{2}\int _{ L(i)}^{L(i+1)}\exp \left (\frac{-x^{2}} {2\sigma ^{2}} \right )dx \\ & =& \sum _{i=1}^{\mathcal{N}}(i - 1/2)^{2}\left [\text{erfc}\left (\frac{L(i)} {\sqrt{2}\sigma } \right ) -\text{erfc}\left (\frac{L(i + 1)} {\sqrt{2}\sigma } \right )\right ]\;.{}\end{array}$$
(8.101)

Thus, for a given value of the time interval between samples, the correlation coefficient for the quantized data is as given in Eqs. (8.95), (8.99) or (8.100), and (8.101). Note that the ratio 〈x y〉∕〈x 2〉 is independent of the frequency response of the system considered and is based on a Gaussian distribution of the amplitude.

Figures 8.13 and 8.14 show examples of the relationship between the correlation of the quantized signals and the true signal correlation. Both of the figures result from the same analysis, but the presentation in Fig. 8.14, in which the correlation of the quantized data is shown as a fraction of the true correlation, helps to emphasize the nonlinearity in the response. A linear response would appear as a horizontal line in Fig. 8.14, and the curves approach this condition as the number of quantization levels increases. Except for observations of the strongest sources, the signal-to-noise ratio from an individual element of a synthesis array is small. Thus, the working point on the curves in Figs. 8.13 and 8.14 is generally near the left side, where the linearity for signals from cross-correlated pairs is best. As the number of quantization levels increases, the accuracy of the correlation increases. The curves provide an indication of the extent to which the quantization affects the measurement of cross-correlation of signals with Gaussian amplitude distribution. A detailed discussion of the effects of quantization of the signal amplitude is given by Benkevitch et al. (2016). This includes the case in which the cross-correlated signals have different amplitudes, and the effects of quantization as the cross-correlation of the analog waveforms approach unity.

Fig. 8.13
figure 13

Curves of the correlation coefficient of quantized data as a function of the true correlation (i.e., the correlation of the unquantized data). The lowest (solid) curve is for 2-level quantization, and moving upward, the curves are for 3 levels (long dashes), 4 levels (long and short dashes), 8 levels (small dashes) and 16 levels (solid line). Similar curves for three and four quantization levels are given in Fig. 8.7.

Fig. 8.14
figure 14

Curves of the correlation coefficient of quantized data, expressed as a fraction of the true correlation. These are from the same data as used in Fig. 8.13, but here they are plotted as fractions of the true (unquantized) correlation, in which the nonlinearity appears as a deviation from a horizontal line. The lowest curve is for 2-level quantization, and moving upward, the curves are for 3, 4, 8, and 16 levels. The points at which the curves meet the left vertical axis indicate the reduction in correlation resulting from quantization when the signal-to-noise ratio is low, as given in Table 8.2. The signal-to-noise ratio increases as the curves move from left to right, and the correlation coefficients of both the quantized and unquantized data move toward 1.0 for the theoretical case of complete correlation between the two signals.

For ease of computation, the correlation can be expressed as a rational function, or similar approximation, of the correlator output: See Appendix Appendix 8.3 for four-level quantization. For three-level quantization, procedures for determination of the cross-correlation, ρ, from the correlator output are given by Kulkarni and Heiles (1980) and D’Addario et al. (1984). However, with the continuing increase in computer power, larger numbers of levels are generally used.

8.4.2 Oversampling

Sampling of signals at the Nyquist rate results in no loss of information, but quantization causes a reduction in sensitivity as represented by the quantization efficiency. Some of the loss due to quantization can be recovered by oversampling, that is, sampling faster than the Nyquist rate. For sampling of random noise with an ideal rectangular spectrum of width Δ ν, the time interval between adjacent Nyquist samples is 1∕(2Δ ν). With Nyquist sampling, the noise within each sample is uncorrelated with respect to the noise in any other sample, and when such data are combined, the noise combines additively in power. Consider the case of oversampling in which the number of samples per second is β times the Nyquist rate. When the sample rate exceeds the Nyquist rate, the samples are no longer independent, and for any particular sample, there are components of the noise within other samples that are correlated with the noise in the sample considered. [Note, however, that for any two samples spaced by β times the sample interval (i.e., spaced at the Nyquist interval), or by an integral multiple of the Nyquist interval, the noise is uncorrelated.Footnote 5] The correlated components of the noise in different samples combine additively in voltage, rather than additively in power, as is the case for uncorrelated noise.

To illustrate how the components of noise combine, consider one pair of antennas and, for example, just four consecutive samples at the correlator output. Let a 1, a 2, a 3, and a 4 be these voltages, which are proportional to the product of the voltages at the correlator inputs. Then we have for the squared sum of these correlated noise voltages, i.e., the total noise power,

$$\displaystyle\begin{array}{rcl} & & [a_{1} + a_{2} + a_{3} + a_{4}]^{2} = \\ & & a_{1}^{2} + a_{ 2}^{2} + a_{ 3}^{2} + a_{ 4}^{2} + 2(a_{ 1}a_{2} + a_{1}a_{3} + a_{1}a_{4} + a_{2}a_{3} + a_{2}a_{4} + a_{3}a_{4})\;.{}\end{array}$$
(8.102)

The autocorrelation coefficient of the quantized signals at the correlator input is R(n τ s ), where n is an integer and τ s is the spacing in time between adjacent samples. The output of the correlator consists of values that are the product of two input samples, so the autocorrelation coefficient of the samples at the correlator output is R 2(n τ s ). The mean noise power is given by the mean of the terms in the right side of Eq. (8.102), in which each of the a n 2 terms can be replaced by the mean squared noise amplitude 〈a 2〉, and each of the a m a n terms by 〈a 2R 2( | nm | τ s ). Thus, the squared sum of the four noise voltages becomes

$$\displaystyle{ 4\langle a^{2}\rangle + 2\langle a^{2}\rangle [3R^{2}(\tau _{ s}) + 2R^{2}(2\tau _{ s}) + R^{2}(3\tau _{ s})]\;. }$$
(8.103)

If the four noise terms were uncorrelated, i.e., if the R 2 terms were zero, the noise power would be the sum of the individual noise powers, 4〈a 2〉. The effect of the correlation of the noise is to increase the averaged noise power by a factor equal to (8.103) divided by 〈4a 2〉:

$$\displaystyle{ 1 + 2[(3/4)R^{2}(\tau _{ s}) + (1/2)R^{2}(2\tau _{ s}) + (1/4)R^{2}(3\tau _{ s})]\;. }$$
(8.104)

In the general case, averaging a total of N samples at the correlator output, this factor becomes

$$\displaystyle\begin{array}{rcl} & & 1 + 2\Bigg[\Bigg.\left (\frac{N - 1} {N} \right )R^{2}(\tau _{ s}) + \left (\frac{N - 2} {N} \right )R^{2}(2\tau _{ s}) + \left (\frac{N - 3} {N} \right )R^{2}(3\tau _{ s}) +... \\ & & \phantom{1 + 2\Bigg[\Bigg.} + \left.\left ( \frac{1} {N}\right )R^{2}[(N - 1)\tau _{ s}]\right ]\;. {}\end{array}$$
(8.105)

In practice, in radio astronomy, the rate at which the data are sampled is in the range of MHz to GHz. The averaging times are in the range milliseconds to seconds, so N is likely to be within the range 103 to 109. The autocorrelation coefficient decreases as the time interval between samples increases, and in practice, R 2(n τ s ) becomes very small for n τ s 200 times the Nyquist sample interval. Thus, for the terms within the square brackets in Eq. (8.105), those after about the first ∼ 200β can be neglected. Since, in most cases, N ≫ 200β, the squared sum of the noise voltages simplifies to

$$\displaystyle{ 1 + 2[R^{2}(\tau _{ s}) + R^{2}(2\tau _{ s}) + R^{2}(3\tau _{ s})+\ldots ] = 1 + 2\sum _{n=1}^{\infty }R^{2}(n\tau _{ s})\;. }$$
(8.106)

Equation (8.106) is the fractional increase in the squared noise voltage (i.e., the noise power) that results from the fact that the noise in the samples is no longer independent when the data are oversampled. The quantization efficiency η Q is equal to the quantization efficiency for Nyquist sampling, η Q N , multiplied by \(\sqrt{\beta }\) to take account of the increase in the number of samples, but divided by the square root of Eq. (8.106) because the noise in different samples is no longer independent. Thus, noting that τ s  = 1∕(2β Δ ν), we obtain

$$\displaystyle{ \eta _{Q} = \frac{\eta _{QN}\sqrt{\beta }} {\sqrt{1 + 2\sum _{n=1 }^{\infty }R^{2 } \left (\frac{n} {2\beta \varDelta \nu } \right )}}\;. }$$
(8.107)

To illustrate the effect of oversampling, examples of the quantization efficiency η Q , derived using Eqs. (8.95), (8.100), (8.101), and (8.107), are shown in Table 8.3. These are for 2-, 3-, 4-, 8-, and 16-level sampling and values of β equal to 1, 2, 4, 8, 16, and 32. In each case, the value of ε used is the one that maximizes η Q for Nyquist sampling, as given in Thompson et al. (2007).Footnote 6 Note that as β is increased, the improvement gained by each further increase declines, because the correlation between adjacent samples increases, and thus, the new information provided by finer sampling becomes progressively smaller.

Table 8.3 Variation of quantization efficiency, η Q , with oversampling factor β

8.4.3 Quantization Levels and Data Processing

At this point, it is useful to put into perspective the characteristics of quantization schemes, which are summarized in Tables 8.2 and 8.3. It should be remembered that the assumption ρ ≪ 1 was used in determining these values. In considering the relative advantages of different quantization schemes, we note first that both the quantization efficiency η Q and the receiving bandwidth Δ ν may be limited by the size and speed of the correlator system. The overall sensitivity is proportional to \(\eta _{Q}\sqrt{\varDelta \nu }\). Consider two conditions. In the first, the observing bandwidth is limited by factors other than the capacity of the digital system. This can occur in spectral line observing or when the interference-free band is of limited width. The sensitivity limitation imposed by the correlator system then involves only the quantization efficiency η Q in Table 8.2, and the choice of quantization scheme is one between simplicity and sensitivity. In the second case, the observing bandwidth is set by the maximum bit rate that the digital system can handle, as may occur in continuum observation in the higher-frequency bands. For a fixed bit rate ν b , the sample rate is ν b N b , where N b is the number of bits per sample, and the maximum signal bandwidth Δ ν is ν b ∕(2β N b ), where β is the oversampling factor. Thus, the sensitivity is proportional to \(\eta _{Q}/\sqrt{\beta N_{b}}\), and this factor is listed for various systems in Table 8.4, in which N b  = 1 for Q = 2 and N b  = 2 for Q = 3 or 4. Note that oversampling always reduces the performance under these conditions. For those situations in which the capacity of the correlator is limited by the maximum bit rate, the value of 0.64 for Nyquist sampling with two-level quantization results in the highest overall performance. Four-level sampling is almost as good, and four or more levels would be preferred if the bandwidth is limited, as in spectral line observations.

Table 8.4 Sensitivity factor \(\frac{\eta _{Q}} {\sqrt{\beta N_{b}}}\) for a correlator-limited system

A three-level × five-level correlator, for which the quantization efficiency η Q is 0.86, was constructed by Bowers et al. (1973) for spectral line imaging with a two-element interferometer.

A further point to be noted is that with an analog correlator, the sin × sin and cos × cos products for signals from two antennas provide, in principle, exactly the same information. However, with a digital correlator, the quantization noise is largely uncorrelated between the sine and cosine components of the signal, so the quantization loss can be reduced by generating both products and averaging them.

8.5 Accuracy in Digital Sampling

Deviations from ideal performance in practical samplers result in errors that, if not corrected for, can limit the accuracy of images synthesized from the data. Once the signal is in digital form, however, the rate at which errors are introduced is usually negligibly small.

Two-level samplers, which sense only the sign of the signal voltages, are the simplest to construct. The most serious error that is likely to occur is in the definition of the zero level, in which a small voltage offset may occur. The effect of offsets in the samplers is to produce small offsets of positive or negative polarity in the correlator outputs, which can be largely eliminated by phase switching, as described in Sect. 7.5 Alternately, the offsets in the samplers can be measured by incorporating counters to compare the numbers of positive and negative samples produced. Correction for the offsets can then be applied to the correlator output data [see, e.g., Davis (1974)].

In samplers with three or more quantization levels, the performance depends on the specification of the levels with respect to the rms signal level, σ. An automatic level control (ALC) circuit is therefore sometimes used at the sampler input. Errors resulting from incorrect signal amplitude become less important as the number of quantization levels is increased; with many levels, the signal amplitude becomes simply a linear factor in the correlator output. In systems using complex correlators, two samplers are usually required for each signal, one at each output of a quadrature network. The accuracy of the quadrature network and the relative timing of the two sample pulses are also important considerations.

8.5.1 Tolerances in Digital Sampling Levels

This section provides an example of the accuracy required in sampling. It is based on a study of errors in three-level sampling thresholds by D’Addario et al. (1984). We start by considering the diagram in Fig. 8.15, which shows the sampling thresholds for a pair of signals to be correlated. Thresholds v 1 and − v 2 apply to the signal waveform x(t) and v 3 and − v 4 to y(t). The Gaussian probability distribution of x and y is given by Eq. (8.1), and the correlator output is proportional to this probability integrated over the (x, y) plane with the weighting factors ± 1 and zero indicated in the figure. This approach enables one to investigate the effect of deviations of the sampler thresholds from the optimum, v 0 = 0. 612σ. For three-level sampling, the correlator output can be written

$$\displaystyle\begin{array}{rcl} \langle r_{3}(\boldsymbol{\alpha },\rho )\rangle = [L(\alpha _{1},\alpha _{3},\rho ) + L(\alpha _{2},\alpha _{4},\rho ) - L(\alpha _{1},\alpha _{4},-\rho ) - L(\alpha _{2},\alpha _{3},-\rho )]\;,& &{}\end{array}$$
(8.108)

where α i  = v i σ, and

$$\displaystyle\begin{array}{rcl} L(\alpha _{i},\alpha _{k},\rho ) =\int _{ \alpha _{i}}^{\infty }\int _{ \alpha _{k}}^{\infty } \frac{1} {2\pi \sqrt{1 -\rho ^{2}}}\,\exp \left [\frac{-(X^{2} + Y ^{2} - 2\rho XY )} {2(1 -\rho ^{2})} \right ]\,dX\,dY \;.& &{}\end{array}$$
(8.109)

Here, X = xσ, Y = yσ, and the integrand in Eq. (8.109) is equivalent to the expression in Eq. (8.1) but with the variables measured in units of σ.

Fig. 8.15
figure 15

Threshold diagram for a correlator, the inputs of which are three-level quantized signals. x and y represent the unquantized signals, and the shaded areas show the combinations of input levels for which the output is nonzero.

D’Addario et al. (1984) point out that since less than 5% loss in signal-to-noise ratio occurs for threshold departures of ± 40% from optimum, the required accuracy of the threshold settings, in practice, depends mainly on the algorithm used to correct the result. Suppose that the thresholds are kept close to, but not exactly equal to, the optimum value. For the x sampler in Fig. 8.15, the deviations from the ideal threshold value α 0 can be expressed in terms of an even part

$$\displaystyle{ \varDelta _{\text{g}x} = \frac{1} {2}(\alpha _{1} +\alpha _{2}) -\alpha _{0}\;, }$$
(8.110)

and an odd part

$$\displaystyle{ \varDelta _{\text{o}x} = \frac{1} {2}(\alpha _{1} -\alpha _{2})\;. }$$
(8.111)

For the y sampler, Δ gy and Δ oy are similarly defined. The Δ g terms produce gain errors. They are equivalent to an error in the level of the signal at the sampler, and they have the effect of introducing a multiplicative error in the measured cross-correlation. The Δ o terms produce offset errors in the correlator output and are potentially more damaging since such errors can be large compared with the low levels of cross-correlation resulting from weak sources. The offset errors, however, can be removed with high precision by phase switching. The cancellation of the offset results from the sign reversal of the digital samples, or of the correlator output, as described in Sect. 7.5 The correlator output of a phase-switched system is of the form

$$\displaystyle{ r_{3s}(\boldsymbol{\alpha },\rho ) = \frac{1} {2}\left [r_{3}(\boldsymbol{\alpha },\rho ) - r_{3}(\boldsymbol{\alpha },-\rho )\right ]\;. }$$
(8.112)

If all α values are within ± 10% of α 0, the output is always within 10−3 (relative error) of the output of a correlator with the same gain errors, but no offset errors, in the samplers. Thus, with phase switching, errors of up to ∼ 10% in the thresholds may be tolerable. Also, corrections can be made for gain errors if the actual threshold levels are known. Since the probability density distribution of the signal amplitudes can be assumed to be Gaussian, the threshold levels can be determined by counting the relative numbers of +1, 0, and −1 outputs from each sampler. When ρ is small (a few percent), a simple correction for the gain error can be obtained by dividing the correlator output by the arithmetic mean of the numbers of high-level ( ± 1) samples for the two signals. Then 10% errors in the threshold settings result in errors of less than 1% in ρ.

Another nonideal aspect of the behavior of the sampler and quantizer is that the threshold level may not be precisely defined but may be influenced by effects such as the direction and rate of change of the signal voltage, the previous sample value (hysteresis), and noise in the sampling circuitry. The result can be modeled by including an indecision region in the sampler response extending from α k Δ to α k +Δ. It is assumed that a signal that falls within this region results in an output that takes either of the two values associated with the threshold randomly and with equal probability. The three-level threshold diagram with indecision regions included is shown in Fig. 8.16.

Fig. 8.16
figure 16

Threshold diagram for a three-level correlator showing indecision regions and the shaded areas within them for which the response is nonzero. The figures ± 1, \(\pm \frac{1} {2}\), and \(\pm \frac{1} {4}\) indicate the correlator response. The diagram shows the (X, Y ) plane in which the signals are normalized to the rms value σ.

The weighting in the indecision regions depends on the probability of the random sample values and is 1/4 when both signals fall within indecision regions, and 1/2 when one signal is within an indecision region and the other produces a nonzero output. As before, the correlator output can be obtained by integrating the weighted probability of the signal values over the (X, Y ) plane. Figure 8.17 shows the decrease in the correlator output as a function of Δ for several values of ρ, computed by expressing the output decrease as a Maclaurin series in Δ (D’Addario et al. 1984). For all cases except those in which ρ approaches unity, the relatively small decrease in output results from the fact that when one input waveform falls within an indecision region, the other generally does not. For the particular case of ρ = 1, the input waveforms are identical and fall within these regions simultaneously. The output decrease is then proportional to Δ, as shown by the broken line in Fig. 8.17: However, this case is only of limited practical importance. For a 1% maximum error, Δ must not exceed 0. 11σ, so the indecision region can be as large as ± 18% of the threshold value. For a maximum error of 0. 1%, the above limits must be divided by \(\sqrt{10}\). Thus, the indecision regions have large enough tolerances that their effect may be negligible.

Fig. 8.17
figure 17

Effect of indecision regions on the output of a three-level correlator. The thresholds are assumed to be set to the optimum value 0.612σ, and the widths of the indecision regions are 2σ Δ. The output is given as a fraction of the output for Δ = 0.

8.6 Digital Delay Circuits

Time delays that are multiples of the sample interval can be applied to streams of digital bits by passing them through shift registers that are clocked at the sampling frequency. Shift registers with different numbers of stages thus provide different fixed delays. A method of using two shift registers to obtain a delay that is variable in increments of the clock pulse interval is described by Napier et al. (1983). However, integrated circuits for random access memory (RAM), developed for computer applications, provide an economical solution for large digital delays.

Another useful technique is serial-to-parallel conversion, that is, the division of a bit stream at frequency ν into n parallel streams at frequency νn, where n is a power-of-two integer. This allows the use of slower and more economical types of digital circuits for delay, correlation, and other processes.

The precision required in setting a delay has been discussed in Sect. 7.3.5 and is usually some fraction of the reciprocal of the signal bandwidth. In any form of delay that operates at the frequency of the sampler clock, the basic delay increment is the reciprocal of the sampling frequency. A finer delay step can be obtained digitally by varying the timing of the sample pulse in a number of steps, for example, 16, between the basic timing pulses. Thus, if an extra delay of, say, 5/16 of a clock interval is required, the sampler is activated 11/16 of a clock interval after the previous clock pulse, and the data are held for 5/16 of an interval to bring them into phase with the clock-pulse timing. Correction for delay steps equal to the sampling interval can also be made after the signals have been cross-correlated, by applying a phase correction to the cross power spectrum.

8.7 Quadrature Phase Shift of a Digital Signal

We have mentioned that complex correlators for digital signals can be implemented by introducing the quadrature phase shift in the analog signal, as in Fig. 6.3, and then using separate samplers for the signal and its phase-shifted version. The Hilbert transformation that the phase shift represents can also be performed on the digital signal, thus eliminating the quadrature network and saving samplers and delay lines, but the accuracy is limited. Hilbert transformation is mathematically equivalent to convolution with the function (−π τ)−1, which extends to infinity in both directions [see, e.g., Bracewell (2000), p. 364]. A truncated sequence of the same form, for example, \(\frac{1} {3}\), 0, 1, 0, −1, 0, \(-\frac{1} {3}\), provides a convolving function for the digital data that introduces the required phase shift. However, the truncation results in convolution of the resulting signal spectrum with the Fourier transform of the truncation function, that is, a sinc function. This introduces ripples and degrades the signal-to-noise ratio by a few percent. Also, the summation process in the digital convolution increases the number of bits in the data samples, but the low-order bits can be discarded to avoid a major increase in the complexity of the correlator. This results in a further quantization loss. The overall result is that the imaginary output of the correlator suffers spectral distortion and some loss in signal-to-noise ratio relative to the real output. These effects are most serious in broad-bandwidth systems, in which the high data rate permits only simple processing. Lo et al. (1984) have described a system in which the real part of the correlation is measured as a function of time offset, as described below for the spectral correlator, and the imaginary part is then computed by Hilbert transformation.

8.8 Digital Correlators

8.8.1 Correlators for Continuum Observations

In continuum observations, the average correlation over the signal bandwidth is measured, and data on a finer frequency scale may not be required. In such cases, the correlation of the signals is measured usually for zero time-delay offset. Digital correlators can be designed to run at the sampling frequency of the signals or at a submultiple resulting from dividing the bit stream from the sampler into a number of parallel streams. In the latter case, the number of correlator units must be proportionally increased, and their outputs can subsequently be additively combined. Two-level and three-level correlators, for which the products are represented by values of −1, 0, and +1, are the simplest. Correlators in which one of the inputs is a two-level or three-level signal and the other input is more highly quantized also have a degree of simplicity. In this case, the correlator is essentially an accumulating register into which the higher-quantization value is entered. The two-level or three-level value is used to specify whether the other number is to be added, subtracted, or ignored. In correlators in which both inputs have more than three levels of quantization, the multiplier output for any single product can be one of a range of numbers. One method of implementing such a multiplier is to use a read-only memory unit as a lookup table in which the possible product values are stored. The input bits to be multiplied are used to specify the address of the required product in the memory.

The output of a multiplier can take both positive and negative values, and, ideally, an up–down counter is required as an integrator. Since such counters are usually slower than simple adding counters, two of the latter are sometimes used to accumulate the positive and negative counts independently. Another technique is to count, for example, −1, 0, and +1 as 0, 1, and 2, and then subtract the excess values, in this case equal to the number of products, in the subsequent processing.

Spectral line (multichannel) correlators are used with most large general-purpose arrays. For continuum observations, they offer advantages such as the ability to reject narrowband interfering signals or to divide a band into narrower sub-bands to reduce the smearing of spectral details.

8.8.2 Digital Spectral Line Measurements

In spectral line observations, measurements at different frequencies across the signal band are required. These measurements can be obtained by digital techniques using a spectral correlator system, which is commonly implemented by measuring the correlation of the signals as a function of time offset. The Fourier transform of this quantity is the cross power spectrum, which can be regarded as the complex visibility as a function of frequency. (This Fourier transform relationship is a form of the Wiener–Khinchin relation discussed in Sect. 3.2) In the case of an autocorrelator (for use with a single antenna), the two input signals are the same waveform with a time offset. Thus, the autocorrelation function is symmetric, and the power spectrum is entirely real and even. However, the cross power spectrum of the signals from two different antennas is complex, and the cross-correlation function has odd as well as even parts.

The output of a spectral correlator system provides values of the visibility at N frequency intervals across the signal band. These intervals are sometimes spoken of as frequency channels and their spacing as the channel bandwidth. To explain the action of a digital spectral correlator, we consider the cross power spectrum \(\mathcal{S}(\nu )\) of the signals from two antennas, as shown in idealized form in Fig. 8.18. Here it is assumed that the source under observation has a flat spectrum with no line features, and the final IF amplifier before the sampler has a rectangular baseband response. In Fig. 8.18, we have included the negative frequencies since they are necessary in the Fourier transform relationships. For −Δ ν ≤ ν ≤ Δ ν, the real and imaginary parts of \(\mathcal{S}(\nu )\) have magnitudes a and b, respectively, and the corresponding visibility phase is tan−1(ba). The cross-correlation function ρ(τ) is the Fourier transform of \(\mathcal{S}(\nu )\), where τ is the time offset:

$$\displaystyle\begin{array}{rcl} \rho (\tau )& =& (a - jb)\int _{-\varDelta \nu }^{0}e^{\,j2\pi \nu \tau }d\nu + (a + jb)\int _{ 0}^{\varDelta \nu }e^{\,j2\pi \nu \tau }d\nu \ \\ & =& 2\varDelta \nu \left [a\frac{\sin (2\pi \varDelta \nu \,\tau )} {2\pi \varDelta \nu \,\tau } - b\,\frac{1 -\cos (2\pi \varDelta \nu \,\tau )} {2\pi \varDelta \nu \,\tau } \right ]\;. {}\end{array}$$
(8.113)

Thus, ρ(τ) has an even component of the form (sinx)∕x, which is related to the real part of \(\mathcal{S}(\nu )\), and an odd component of the form (1 − cosx)∕x, which is related to the imaginary part. The spectral correlator measures ρ(τ) for integral values of the sampling interval τ s . We consider the case of Nyquist sampling, for which τ s  = 1∕(2Δ ν). The measured cross-correlation refers to the quantized waveforms, and the analysis in Sect. 8.4.1 shows how this is related to the cross-correlation of the unquantized waveforms. For correlation levels that are not too large, the two quantities are closely proportional, so for simplicity, we assume that Eq. (8.113) represents the behavior of the measured cross-correlation. The measurements are made with 2N time offsets from − N τ s to (N − 1)τ s between the signals, and Fourier transformation of these discrete values yields the cross power spectrum at frequency intervals of (2N τ s )−1 = Δ νN for Nyquist sampling. The N complex values of the positive frequency spectrum are the data required. Of these, the imaginary part comes from the odd component of the correlator output r(τ). Thus, in the correlation measurement, it suffices to use single-multiplier correlators to measure 2N real values of r(τ) over both positive and negative values of τ for one antenna with respect to the other. As an alternative to measuring only the real part of the correlation, complex correlators could be used to measure both the real and imaginary parts for a range of time offsets from zero to (N − 1)τ s . However, complex correlators require broadband quadrature networks.

Fig. 8.18
figure 18

Cross power spectrum \(\mathcal{S}(\nu )\) of two signals for which the power spectra are rectangular bands extending in frequency from zero to Δ ν. Negative frequencies are included. The solid line represents the real part of \(\mathcal{S}(\nu )\) and the dashed line the imaginary part. The corresponding correlation function is derived in Eq. (8.113).

Measurement of the cross-correlation over the limited time offset range is equivalent to measuring r(τ) multiplied by a rectangular function of width 2N τ s . The cross power spectrum derived from the limited measurements is therefore equal to the true cross power spectrum convolved with the Fourier transform of the rectangular function, that is, with the sinc function

$$\displaystyle{ \frac{\sin (\pi \nu N/\varDelta \nu )} {\pi \nu } \;, }$$
(8.114)

which is normalized to unit area with respect to ν. Any line feature within the spectrum is broadened by the sinc function (8.114) and, depending on its frequency profile, may show the characteristic oscillating skirts. The width of the sinc function at the half-maximum level is 1. 2Δ νN, that is, 1.2 times the channel separation, and this width defines the effective frequency resolution.

The oscillations of the sinc function introduce structure in the frequency spectrum similar to the sidelobe responses of an antenna beam. They result from the sharp edges of the rectangular function that multiplies the correlation function. Such sidelobes are undesirable and can be reduced by choosing weighting functions, other than rectangular truncation, that are constrained to be zero outside the measurement range. It is desirable that weighting functions should taper smoothly to zero at | τ |  = N τ s , thereby reducing unwanted ripples in the smoothing (convolving) function , but also to be as wide as possible in order to keep the width of the smoothing function as narrow as possible. These requirements are not generally compatible, so weighting functions that produce smoothing functions with very low sidelobes have poor frequency resolution. Some commonly used weighting functions are listed in Table 8.5.

Table 8.5 Commonly used weighting functions

Hann weighting, also known as raised cosine weighting, reduces the first sidelobe by a factor of 9 but degrades the resolution by 1.67, compared with uniform weighting. The Fourier transform of the Hann weighting function is the sum of three sinc functions of relative amplitudes 0.25, 0.5, and 0.25. This is the smoothing function in the spectral domain shown in Fig. 8.19b, which corresponds to Hann weighting. For the usual case in which the number of points in the discretely sampled spectrum equals the number of points in the correlation function (i.e., no zero padding, as in the FX correlator, Sect. 8.8.4), the smoothing or convolution can be implemented as a three-point running mean with relative weights of 0.25, 0.5, and 0.25. Thus, the smoothed value of the cross power spectrum at frequency channel n is given by

$$\displaystyle{ \mathcal{S}'\left ( \frac{n\varDelta \nu } {N}\right ) = \frac{1} {4}\,\mathcal{S}\left [\frac{(n - 1)\varDelta \nu } {N} \right ] + \frac{1} {2}\,\mathcal{S}\left ( \frac{n\varDelta \nu } {N}\right ) + \frac{1} {4}\,\mathcal{S}\left [\frac{(n + 1)\varDelta \nu } {N} \right ]\;. }$$
(8.115)

The Hamming weighting function is very similar to the Hann function and would appear to be superior because it produces a better resolution and a lower peak sidelobe level. However, the sidelobes of the Hamming smoothing function do not decrease in amplitude as rapidly as those of the Hann smoothing function. Weighting functions are discussed in detail by Blackman and Tukey (1959) and Harris (1978).

Fig. 8.19
figure 19

(a ) The ordinate is the sinc function sin(π ν NΔ ν)∕(π ν NΔ ν), which represents the frequency response of a spectral correlator with channels of width Δ νN to a narrow line at ν = 0. The abscissa is frequency ν measured with respect to the center of the received signal band. (b ) The same curve after the application of Hann smoothing, as in Eq. (8.115).

A further effect of the finite time-offset range complicates the calibration of the instrumental frequency response in the following way (Willis and Bregman 1981). The frequency responses of the amplifiers associated with the different antennas may not be exactly identical, as discussed in Sect. 7.3 To calibrate the response of each antenna pair over the spectral channels, it is usual to measure the cross power spectrum of an unresolved source for which the actual radiated spectrum is known to be flat across the receiving passband. We can consider the result in terms of the idealized power spectra in Fig. 8.18. If no special weighting function is used, the real and imaginary parts are both convolved with the sinc function (8.114). When a function with a sharp edge is convolved with a sinc function, the result is the appearance of oscillations (the Gibbs phenomenon) near the edge, as shown in Fig. 8.20. The point here is that the real component of \(\mathcal{S}(\nu )\) in Fig. 8.18 is continuous through zero frequency, but the imaginary part shows a sharp sign reversal. Thus, near zero frequency, the observed imaginary part of \(\mathcal{S}(\nu )\) will show oscillations that may be as high as 18% in peak amplitude, whereas the real component will show relatively small oscillations at that point (see also Fig. 10.14b and associated text). As a result, the magnitude and phase measured for \(\mathcal{S}(\nu )\) will show oscillations or ripples, the amplitude of which will depend on the relative amplitudes of the real and imaginary parts, that is, on the phase of the uncalibrated visibility. The uncalibrated phase measured for any source depends on instrumental factors such as the lengths of cables as well as the source position, which may not be known. In general, the phase will not be the same for the source under investigation and the calibrator. Hence, near zero frequency, some precautions must be taken in applying the calibration. Possible solutions to the problem include (1) calibrating the real and imaginary parts separately, (2) observing over a wide enough band that the end channels in which the ripples are strongest can be discarded, or (3) applying smoothing in frequency to reduce the ripples.

Fig. 8.20
figure 20

Convolution of a step function at the origin (broken line) with the sinc function (sinπ x)∕π x. Here, x = ν NΔ ν, and the half-cycle period of the ripple is approximately equal to the width of a spectral channel.

Another problem encountered when observing a spectral line in the presence of a continuum background is caused by reflections in the antenna structure. These reflections cause a sinusoidal gain variation across the passband, the period of which is equal to the reciprocal of the delay of the signal caused by the reflection. In a correlation interferometer, the magnitude of the ripple is a nearly constant fraction of the correlated continuum flux density, and the ripple is removed when the spectrum of the source under investigation is divided by the spectrum of the calibration source.

8.8.3 Lag (XF) Correlator

Correlators can be classified as two general types. In a lag (or XF) correlator, cross-correlation is followed by Fourier transformation, and in an FX correlator, Fourier transformation is followed by cross-correlation. A simplified schematic diagram of a lag correlator is shown in Fig. 8.21. Practical systems are often more complicated and are designed to take full advantage of the flexibility of digital processing techniques. The bandwidths of channels required for spectral line studies vary greatly, from a few tens of hertz to hundreds of megahertz. This versatility is necessary because the widths of spectral features are influenced by Doppler shifts, which are proportional to the rest frequencies of the lines and the velocities of the emitting atoms and molecules. The correlator of the upgraded VLA system (Perley et al. 2009) is fundamentally an XF design, as is the ALMA system, following its digital filter (Escoffier et al. 2000).

Fig. 8.21
figure 21

Simplified schematic diagram of a lag (XF) spectral correlator for two sampled signals. τ s indicates a time delay equal to the sampling interval and C indicates a correlator. The correlation is measured for zero delay, for the \(\hat{x}\) input delayed with respect to the \(\hat{y}\) input (left correlator bank), and for \(\hat{y}\) delayed with respect to \(\hat{x}\) (right correlator bank). The delays are integral multiples of τ s .

A recirculating correlator is one that can store blocks of data and process them multiple times through the correlator. This can be done only when the correlator is capable of running faster than the incoming data rate. These multiple passes allow the number of correlator channels to be increased. For example, if data samples are processed by the correlator twice, the range of delays can be doubled, so the spectral resolution is improved by a factor of two.

To implement the above scheme, recirculator units are required, which are basically memories that store blocks of input samples and allow them to be read out at the correlator input rate. These memory units are required in pairs, so that one is filled with data at the Nyquist rate appropriate to the chosen signal bandwidth, while the other is being read at the maximum data rate. One memory becomes filled in the time that the other is read for the required number of times, and the two are then interchanged. Examples of recirculating lag correlators are described by Ball (1973) and Okumura et al. (2000). The WIDAR correlator on the VLA uses recirculation (Perley et al. 2009).

8.8.4 FX Correlator

The designation FX indicates a correlator in which Fourier transformation to the frequency domain is performed before cross multiplication of data from different antennas. In such a correlator, the input bit stream from each antenna is converted to a frequency spectrum by a real-time FFT, and then for each antenna pair, the complex amplitudes for each frequency are multiplied to produce the cross power spectrum. A major part of the computation occurs in the Fourier transformation, for which the total number of operations is proportional to the number of antennas. In comparison, in a lag correlator, the total computation is largely proportional to the number of antenna pairs. Thus, the FX scheme offers economy in hardware, especially if the number of antennas is large (see Sect. 8.8.5). The principle of the FX correlator, based on the use of the FFT algorithm, was discussed by Yen (1974) and first used in a large practical system by Chikada et al. (19841987). Description of system designed for the VLBA are given by Benson (1995) and Romney (1999).

Two slightly different implementations of the FX correlator have been used. In one, both in-phase and quadrature components of the signal are sampled to provide a sequence of N complex samples, which is then Fourier-transformed to provide N values of complex amplitude, distributed in frequency over positive and negative frequencies. In the other, N real samples are transformed to provide N values of complex amplitude. However, the negative frequencies are redundant, and only N∕2 spectral points need be retained. We follow the second scheme in the discussion below.

Figure 8.22 is a schematic diagram of the basic operations of an FX correlator. The input sample stream from an antenna is Fourier transformed in contiguous sequences of length-N samples, where N is usually a power-of-two integer for efficiency in the FFT algorithm. The output of each transformation is a series of N complex signal amplitudes as a function of frequency. The frequency spacing of the data after transformation is 1∕(N t s ), where t s is the time interval between samples of the signals. In the cross-multiplication process that follows the FFT stage, the complex amplitude from one antenna of each pair is multiplied by the complex conjugate of the amplitude of the other. These multiplications occur in the correlator elements in Fig. 8.22. Note that the data in any one input sequence are combined only with data from other antennas for the same time sequence. This leads to some differences in the effective weighting of the data in the FX and XF designs.

Fig. 8.22
figure 22

Simplified schematic diagram of an FX correlator for two antennas. The digitized signals are read into the shift registers and an FFT performed at intervals of N sample periods. The correlator elements, indicated by C, form products of one signal with the complex conjugate of the other. In an array with n a antennas, the outputs of each FFT are split (n a − 1) ways for combination with the complex amplitudes from all other antennas.

8.8.5 Comparison of XF and FX Correlators

Spectral Response. In the FX configuration, the F engine (DFT processor) operates on short segmented blocks of data in order to control the spectral resolution. The equivalent correlation function constructed from a block of data has N ways, or N possible multiplications for the zero lag component. There are progressively fewer multiplications available for increasing lags because of the data block boundaries. The correlator function at the maximum lags of ± (N − 1)t s can be obtained in only one way. Hence, the density of lag multiplications has a triangular shape as a function of lag over the range ± N t s , as shown in Fig. 8.23 (see also Moran 1976). Hence, the spectral response, the Fourier transform of this triangular function, is sinc2(N t s ν) = sinc2(n), where ν = nN t s and n is the spectral channel number. An alternate derivation of this result is given in Sect. A8.4.1, where it is shown that the spectral response to a sine wave is a sinc2 function.

Fig. 8.23
figure 23

(left) The density of lag calculations, or intrinsic weighting function, for an FX correlator (solid line) and an XF correlator (dotted line). N is the segment size for the FX correlator. For comparison purposes, the width of the function for the XF correlator is chosen to make the number of spectral channels the same. (right) The spectral response for the FX correlator (solid line), a sinc2 function (see Appendix Appendix 8.4 for additional explanation) and the XF correlator (dotted line), a sinc function given by Eq. (8.114). Adapted from Romney (1999) and Deller et al. (2016).

For the XF configuration, the spectral resolution depends on the length of the correlation function, calculated as described in Sect. 8.8.2 Since the correlation function is calculated on a segment of data that is much longer than the block length, the density of lag multiplications is essentially uniform, except for a very small end effect. Hence, the spectral response is sinc(N t s ν) or sinc(n). The spectral responses for the FX and XF correlators are shown in Fig. 8.23.

Note that the integral over frequency of both of these spectral responses is unity. Therefore, the flux or the area under a spectral line profile, the convolution of the source spectrum with the spectral response function, is conserved. The peak amplitude of a spectral feature narrower than the resolution will depend on where it falls with respect to the spectral channels. A line that falls midway between two channels will have its peak amplitude reduced by sinc2(1∕2) = 0. 41 for the FX processor compared with sinc(1∕2) = 0. 81 for the XF processor. This is the well-known effect called scalloping. It can be mitigated by the technique of padding with zeros to obtain an interpolated spectrum (see Sect. A8.4.2).

In some cases, it may be desirable to actually calculate the correlation functions from the output of the F engine, such as for application of a full nonlinear quantization correction. It is well known (Press et al. 1992) that it is necessary to pad the spectrum with N zeros in order to obtain the correct result [see discussion after Eq. (A8.40)]. The implementation of this calculation is discussed by O’Sullivan (1982) and Granlund (1986).

Signal-to-Noise Ratio. The fundamental difference between the FX and XF processors is the density weighting in the lag domain. Both systems have the same number of equivalent multiplications, as can be seen in Fig. 8.23. The FX covers twice the number of lags as the XF processor but has lower density as the lag number increases. In particular, the FX provides half the lag density for k = N∕2. For a continuum source, the signal-to-noise ratio of the FX and XF systems is the same. This can be appreciated by the fact that only the zero lag multiplications are important, and they are equal in both systems. Similarly, the signal-to-noise ratio for a very-narrow-bandwidth source, less than the resolution, is also the same because the total number of equivalent multiplications is the same.

There is a small difference in response for signals that have line widths about equal to the resolution. In particular, for this case, the amplitude of a spectral line is reduced by a factor of about 0.82 (Okumura et al. 2001; Bunton 2005). This is a problem only for slightly resolved spectral features. In any event, most spectrometers are designed to produce several channels per resolution element in order to properly analyze the lines. This perceived deficiency in the FX correlator is due to the distribution of lags. The FX correlator has a larger range but fewer multiplications at lag(k) ± N∕2 (see Fig. 8.23). There are several approaches to recovering this loss of information. The classic method (Welch 1967; Percival and Walden 1993) is to overlap the segments in the block processing in the F engine. A 50% overlap recovers most of the lost signal-to-noise ratio but at a cost of doubling the processing time in the F engine. This overlap feature was available in the original FX VLBA processors but rarely, if ever, used (Romney 1995). Another approach is to simply channel-average the spectrum, but this wastes the resolution capability of the F engine. Note that with polyphase filter banks, scalloping for narrow spectral lines and signal-to-noise ratio loss are very small.

Number of Operations. We can make an approximate comparison of the workload requirements of XF and FX signal processors by comparing the number of multiplications needed in each system. For this rather simplistic analysis, we assume that the data are streams of real numbers at the Nyquist interval appropriate for bandwidth Δ ν, i.e., t s  = 1∕(2Δ ν). To make this comparison, we further assume that the number of lags computed in the X engine (lag correlator), N, is equal to the data segment length into the F engine. This makes the spectral resolution of both systems approximately equal (see Fig. 8.23 for exact responses).

Consider the analysis of one second of data, i.e., 2Δ ν samples. For the XF system, a lag correlator is required for each baseline. Thus, 2N Δ ν multiplications are required for each baseline. Since N t s  ≪ 2Δ ν, the edge effects in calculating the correlation function are negligible (i.e., all lags have almost the same number of multiplications approaching N), and the workload of the single Fourier transform at the end of the integration period is negligible compared with the workload of calculating the correlation function. Thus, the rate of multiplications (multiplies per second), r XF, is

$$\displaystyle{ r_{\mathrm{XF}} = 2\varDelta \nu Nn_{b}\;, }$$
(8.116)

where n b is the number of baselines. For the FX processor, one DFT engine is required for each antenna. We assume that the number of multiplications for the FFT implementation of the N-point DFT is Nlog2 N. (Some variation exists, depending on the FFT implementation, e.g., an FFT with N being a power of four would run somewhat faster.) The cross power spectrum calculation requires the pairwise cross multiplication of the outputs of DFT engines for all baselines. These multiplications are complex, requiring four real multiplications each. In addition, only the N∕2 spectral points at positive frequencies need to be calculated and retained. The number of multiplications is therefore [n a Nlog2 N + 4N n b ∕2]M, where M is the number of segments processed, 2Δ νN. Since M N = 2Δ ν, the aggregate multiplication rate is

$$\displaystyle{ r_{\mathrm{FX}} = 2\varDelta \nu [n_{a}\log _{2}N + 2n_{b}]\;. }$$
(8.117)

The workload ratio, \(\mathcal{R} = r_{\mathrm{XF}}/r_{\mathrm{FX}}\) is therefore

$$\displaystyle{ \mathcal{R} = \frac{n_{b}N} {n_{a}\log _{2}N + n_{b}}\;. }$$
(8.118)

The n a log2 N factor reflects the log2 N advantage and the antenna-based processing of the DFT engine. The n b N factor reflects the baseline processing of the X engine. Since n b  = n a (n a − 1)∕2, we can rewrite Eq. (8.118) as

$$\displaystyle{ \mathcal{R} = \frac{N} { \frac{2\log _{2}N} {n_{a}-1} + 2}\;. }$$
(8.119)

Note that this relation holds for n a  ≥ 2 because no single antenna spectra are calculated. [Analysis for a spectrometer on a single antenna would yield \(\mathcal{R}\ =\ N/(\log _{2}N + 1)\).] The limiting forms of Eq. (8.119) are

$$\displaystyle{ \begin{array}{ll} \mathcal{R} = N/2\:, &\ \ \ \ \ \ n_{a} \gg 1 +\log _{2}N\;, \\ \mathcal{R} = \frac{N(n_{a}-1)} {2\log _{2}N} &\qquad n_{a} \ll 1 +\log _{2}N\;. \end{array} }$$
(8.120)

In general, the larger the values of N or n a , the more the FX design is favored. For example, with n a  = 10 and N = 1024, \(\mathcal{R}\sim 240\). Perhaps the most important limitation in Eq. (8.119) is that the X engine operates on one or a few-bit representation of the signal and multiplication can be achieved by simple table lookup, whereas the F engine needs more bits per sample and there is additional bit growth in its internal operations. Furthermore, the detailed architecture of chips has a major influence on calculation speed. Hence, Eq. (8.119) is a useful guide for the general dependence of \(\mathcal{R}\) on N and n a but does not accurately specify a crossover point favoring one design over the other. The advantage clearly shifts to the FX design for very large n a or N.

Digital Fringe Rotation. In early systems, fringe rotation was often applied to the signal as an analog process, but generally it is advantageous to implement it after digitization. For example, in VLBI observations in which the data are recorded as digital samples, it is useful to be able to repeat the analysis with different fringe rates if the position of the source on the sky is not known with sufficient accuracy before the observation. Digital fringe rotation is usually applied to the digitized IF waveform before it goes to the correlator and involves multiplication with a digitized fringe rotation waveform. It is desirable to use a multibit representation for the rotated data to maintain the required accuracy, and thus, the number of bits in the input data to the correlator may be increased. Increasing the number of bits per sample in a lag correlator results in a proportional increase in complexity. Thus, it may be necessary to truncate the data before input to the correlator, which effectively introduces the quantization loss a second time. In contrast, in the FX design, multibit data representation is required in the FFT processing, so the bit increase that fringe rotation presents is more easily accommodated. See Sect. 9.7.1 for more details.

Fractional Sample Delay Correction. In digital implementation of the compensating delays, one way of adjusting the delay in steps smaller than the sampling interval is to adjust the timing of the sampler pulses, as described in Sect. 8.6. Another way of introducing a fractional sample period delay is done after transformation to the frequency domain by incrementing the phase values by an amount that varies in proportion to the frequency across the IF band. In the FX correlator, this is easily done because the signals appear as an amplitude spectrum every FFT cycle, and the correction can be applied as required for each antenna before the data are combined in antenna pairs. With a lag correlator, there are two problems in this process. First, the transformation to a spectrum occurs after the data are combined for antenna pairs, so many more values require correction. Second, for long baselines, the corrections required may occur more rapidly than the rate at which cross-correlation values are transformed to cross power spectra. Thus, it may be possible to apply only a statistical correction rather than an exact one. See Sect. 9.7.3 for a description of the statistical corrections.

Quantization Correction. The nonlinearity of the amplitude of the cross-correlation measured using coarsely quantized samples is seen in the Van Vleck relationship [Eq. (8.25)]. Application of a correction for the nonlinearity in quantization in the lag (XF) correlator is a relatively straightforward process because the cross-correlation values are directly calculated. To obtain the cross-correlation values in the FX correlator, the cross power spectrum at the correlator output must be Fourier transformed from the frequency domain to the lag domain. After applying the correction, the data must then be transformed back to a frequency spectrum. The correction is necessary only if the correlation of the total waveform (signal plus noise) is large for any pair of antennas. This condition implies observation of a source that is largely unresolved and sufficiently strong that the signal power in the receiver is comparable to the noise or greater. In the case of a spectral line observation, it is the power averaged over the receiver bandwidth that is important.

Adaptability. The FX design is somewhat more easily expanded or adapted to special requirements because more of the system is modularized per antenna rather than per baseline, as in the lag correlator. Addition of an extra antenna to an FX correlator requires less modification of the reduction procedure than is necessary for a lag correlator. Thus, the FX design is convenient for projects in which the number of antennas is planned to increase over time and is more efficient for larger arrays (Parsons et al. 2008).

Pulsar Observations. For pulsar observations, a gating system at the correlator output is required to separate data received during the pulsar-on period, so that the sensitivity is not degraded by noise received when the pulsar is off. For many pulsars, which have periods ≥ 0. 1 s, time resolution of order 1 ms is adequate in the gating.Footnote 7 With an FX correlator, it is necessary to collect data in complete sequences of N samples, so the gating process has to accommodate data that arrive at time intervals of ∼ N times the sample interval t s . For example, with N = 1, 024 and a total bandwidth of 10 MHz, N t s  ≃ 500 μs. Again, this might restrict flexibility for the fastest pulsars. However, a nice feature of the FX correlator is that complete spectra are obtained during each N t s interval in time. In the subsequent time averaging, it is possible to process the frequency channels individually and to vary the time of the gating pulse for each one so as to match the variation in pulse timing that results from dispersion in the interstellar medium.

Choice of Correlator Design. Because the relative advantages of the lag and FX schemes discussed above involve a number of different features, the best choice of architecture for any particular application may not be immediately obvious. Detailed design studies for different approaches, taking account of the precise requirements and the implementation of the very-large-scale integrated (VLSI) circuits, are required. For discussions of lag and FX correlators, see D’Addario (1989), Romney (19951999), and Bunton (2003). The widespread use of polyphase filter banks for precise channel definitions and radio frequency interference (RFI) excision favors the FX approach (see Sect. 8.8.9).

8.8.6 Hybrid Correlator

In designing a broadband correlator, it may be advantageous or necessary to divide the analog signal of total bandwidth, Δ ν, from each antenna into n f contiguous narrow sub-bands. A separate digital sampler is used for each such sub-band, and the correlator is designed as n f sections operating in parallel to cover the full signal band. A system of this type that incorporates both analog filtering and digital frequency analysis is referred to as a hybrid correlator. If the digital part uses a lag design, then the rate of digital operations is reduced by a factor n f relative to the rate for a lag correlator that processes the whole bandwidth without subdivision. This can be seen from Eq. (8.116), where for one sub-band, the bandwidth is Δ ν s  = Δ νn f , the number of channels required is Nn f , but n f such sections of digital processing are required. We can write a cost equation for a hybrid correlator (Weinreb 1984), as

$$\displaystyle{ C = A_{1}\frac{\varDelta \nu n_{a}(n_{a} - 1)N} {n_{f}} + A_{2}n_{f}n_{a} + A_{3}\;, }$$
(8.121)

where A 1 and A 2 are coefficients for the digital and analog hardware, respectively, and A 3 is another constant.

In this equation, the cost can be minimized with respect to n f , with the result that

$$\displaystyle{ n_{f} = \left [\frac{A_{1}} {A_{2}}\varDelta \nu (n_{a} - 1)N\right ]^{1/2}\;. }$$
(8.122)

Equation (8.122) is useful only if the digital electronics are fast enough to handle a bandwidth of Δ νn f . Over the last decades, the sampling rates have steadily risen and the costs have dropped for digital hardware, while the cost of analog electronics has remained relatively flat. The evolution of design in hybrid correlators can be seen in Table 8.6. A general disadvantage of the hybrid correlator is that very careful calibration of the frequency responses of the sub-bands is required to avoid discontinuities in gain at the sub-band edges. In general, it is advantageous to use the fastest samplers to minimize the analog filtering required. However, at millimeter wavelengths, where very wide bandwidths are needed and can be accommodated by receivers, the restriction on digital sampling speed requires some channelization. If an FX implementation is used for the digital section, a similar cost equation can be written, but there is less reduction in the number of operations since in Eq. (8.117), N enters logarithmically.

Table 8.6 Hybrid channelization

8.8.7 Demultiplexing in Broadband Correlators

The bit rate for the VLSI circuits used in large correlator systems is generally slower than that of the digital samplers that are used with broadband correlators. Serial-to-parallel conversion at the sampler output, that is, demultiplexing in the time domain, allows use of optimum bit rates for the correlator. Consider a system in which each sampler output is demultiplexed into n streams, and assume for simplicity that there is one bit per sample; parallel architecture accommodates multiple bits. Any n contiguous samples all go to different streams. To obtain all the products required in a lag correlator for a pair of IF signals with this configuration of the data, it would be necessary to include cross-correlations between each stream of one signal with every stream of the other signal. To simplify the system, Escoffier (1997) developed a scheme in which the n demultiplexed bit streams from each signal are fed into a large random access memory (RAM) and read out in reordered form. Each demultiplexed stream then contains a series of discontinuous blocks of ∼ 105 samples. Each block contains data contiguous in time, as sampled. Cross-correlations are performed between data in corresponding blocks only. Thus, for any pair of input signals, n cross-correlators running at the demultiplexed rate are required for each value of lag. Also, each signal requires two RAM units so that one is filled as the other is read out. In Escoffier’s system, the sample rate is 4 Gbit s−1, n = 32, and the length of a block of the demultiplexed data is approximately 1 ms. Since cross-correlations do not extend across the boundaries of any given block, there is a very small loss of efficiency, which in this case is about 0.2%. Another possible approach is based on demultiplexing in the frequency domain, as in the case of the hybrid correlator. It is then necessary only to cross-correlate corresponding frequency channels between each antenna, so the number of cross-correlators per signal pair is again equal to n for each lag. Carlson and Dewdney (2000) have described an all-digital development of the frequency demultiplexing principle used in the hybrid correlator. This is used with the expanded VLA (Perley et al. 2009), and the system is described as a WIDAR correlator. Broadband signals are digitized at full bandwidth, divided into frequency channels using digital filters, and resampled at the appropriate lower rate before cross-correlation between all antenna pairs. (The use of digital filters avoids the small differences in the responses of analog filters, which in some systems provide the initial channelization.) As a final step, the cross-correlated data are Fourier transformed to the frequency domain. This scheme is sometimes referred to as an FXF system. Both Escoffier’s reordering scheme and the WIDAR system of demultiplexing provide approaches to the design of large broadband correlators. The latter requires fewer lags because the digital filters provide part of the spectral resolution.

For filtering sampled signals, digital filters of the FIR (finite impulse response) type can be used, in which the incoming sample stream is convolved with a series of numbers, referred to as tap weights, the Fourier transform of which represents the filter response (Escoffier et al. 2000). The tap weights can be stored in a RAM and readily changed as required. An important advantage of digital filters is the freedom from individual variations of the characteristics. However, it may be necessary to truncate the output data samples to match the number of bits per sample that can be handled by the correlator, and thus a further quantization loss may be incurred.

8.8.8 Examples of Bandwidths and Bit Data Quantization

The initial observing bandwidth of the 27-antenna VLA, when it came into operation in the early 1980s, was 100 MHz per polarization with three-level (2-bit) sampling. The expanded system that came into operation around 2010, covering a frequency range of 1–50 GHz, has a maximum observing bandwidth of 8 GHz per polarization with 3-bit sampling, or 8-bit sampling with a reduced bandwidth (Perley et al. 2009). This large increase in data capacity is possible as a result of the increase in computing speed and in signal transmission capacity using optical fiber. The Atacama Large Millimeter/submillimeter Array, which came into operation in 2012, covers bandwidths of 8 GHz per polarization with 3-bit (8-level) quantization (Wootten and Thompson 2009). The number of antennas is 64 and the correlator is FX with initial digital filtering, sometimes referred to as an FXF system.

In the meter-wavelength range, observing bandwidths are generally narrower than at shorter wavelengths, but the spectrum is often more heavily used by transmitting services, so the requirement for avoiding or removing interfering signals is important. Larger numbers of bits allow for greater dynamic range in the system response, which helps to reduce the probability that interfering signals will cause overloading. The LWA (Long Wavelength Array) covers 20–80 MHz using 8-bit (256-level) sampling with the option of 12-bit (4096-level) sampling. The sample frequency is 196 mega-samples/sec (Ellingson et al. 2009). The LOFAR system covers 15–80 MHz and 110–240 MHz using 12-bit digitization (de Vos et al. 2009).

8.8.9 Polyphase Filter Banks

Polyphase filtering is a digital signal-processing technique that was developed for applications such as the separation of signals in multichannel communication systems with high interchannel rejection (Bellanger et al. 1976). The disadvantages of the nonoverlapping-segment discrete Fourier transform (DFT) processing, which we will call the single-block Fourier transform (SBFT) method, have been noted in earlier sections of this chapter and are also described in Appendix Appendix 8.4. Namely, this approach has high spectral leakage since the spectral response is a sinc-squared function that has sidelobe levels as high as −13.5 db. In addition, the amplitude of a monochromatic signal, or unresolved cosmic line, depends on its relative location with respect to the channel boundaries, going from 1 at channel center to (2∕π)2 = 0. 41 at the edge. This effect is called scalloping. There is a slight loss in sensitivity for signals whose line widths are close to the spectral resolution, which is related to the effective lag distribution of the DFT (see Sect. 8.8.5).

Polyphase filtering and polyphase filter banks (PFBs) correct these deficiencies at a modest computational overhead. PFBs have become an important tool in radio astronomy as a way of excising radio frequency interference since they make possible the elimination of only the specific channels in which the interference occurs. It is also helpful in spectroscopic observations of some cosmic sources such as masers, where a very strong and narrow line in the passband makes it difficult to study other nearby lines because of the effect of spectral leakage. For detailed treatments of PFBs, see Crochiere and Rabiner (1981), Vaidyanathan (1990), and Harris et al. (2003). Useful tutorials are available by Harris (1999) and Chennamangalam (2014). For applications to radio interferometry, see Bunton (20002003).

Before describing the PFB, consider an elementary design of a digital filter bank based on a conventional analog filter bank with M equally spaced filters spanning the frequency range 0 to Δ ν. Suppose the input voltage is x(t), which is a bandlimited Gaussian process in the frequency range 0 to Δ ν. x(t) can be represented by a digital sequence x(n), sampled at the Nyquist interval 1∕2Δ ν. A crude lowpass filter can be constructed by taking a running mean of M samples in the time domain. The spectral response to this “boxcar” averaging is a sinc function with its first null at 2Δ νM. Obtaining a perfect lowpass filter response in the frequency domain with a cutoff at ν c  = Δ νM would require the convolution in the time domain with a sinc function [sinc(x) = sin(π x)∕(π x)],

$$\displaystyle{ h(t) = \text{sinc}(2\nu _{c}t) = \text{sinc}(n/M)\;. }$$
(8.123)

Note that for M = 1, h(t) = 1 for n = 0 and otherwise is zero, so x(n) remains unchanged. However, for M > 1, perfect lowpass filtering action requires a convolution over infinite time. As an approximation, we can use N-point smoothing. The filter shape will be

$$\displaystyle{ H(\nu ) =\sum _{ n=0}^{N-1}\text{sinc}(n/M)\,e^{\,j\left (\pi \nu n/\varDelta \nu \right )}\;, }$$
(8.124)

which has a fairly sharp cutoff at ν = Δ νM. y(n), the smoothed version of x(n), will be oversampled by a factor of about M. The normal process at this point is to resample y(n), taking every Mth sample. This process is called decimation,Footnote 8 or downsampling. To make the rest of the filter bank, multiply x(n) by e j2π ν t, where ν = mΔ ν and m = 1 to M − 1, filter each stream by h(n), and downsample. This process is inefficient since the downsampling discards most of the arithmetic computations. The PFB provides a more efficient processing structure to obtain a filter shape with a sharp cutoff.

We now describe the PFB, following the analysis of Bunton (20002003). Consider a sample sequence of data, x(n), of length N, which is multiplied by a window function h(n). Its DFT is

$$\displaystyle{ X(k) =\sum _{ n=0}^{N-1}h(n)x(n)\,e^{-j\,\left (2\pi /N\right )nk}\;, }$$
(8.125)

where k ranges from 0 to N − 1. The frequency steps are 2Δ νN, i.e., covering both positive and negative frequencies. If H(k), the DFT of h(n), has a width of approximately 2Δ νM, then X(k) will be oversampled, and only r = NM samples need be retained. N and M are chosen so that r is an integer. If H(k) is desired to be the discrete idealization of a perfect lowpass filter, i.e.,

$$\displaystyle{ \begin{array}{llll} H(k)& = 1\;,\qquad \qquad &&N - M < k \leq M\;, \\ & = 0\;, &&\mbox{ otherwise}\;,\end{array} }$$
(8.126)

then

$$\displaystyle{ h(n) \simeq \text{sinc}\left [\left (\frac{n -\frac{N} {2} } {N} \right )r\right ]\;. }$$
(8.127)

The decimated spectrum, i.e., taking every rth point of X(k) in Eq. (8.125), is

$$\displaystyle{ X(k') =\sum _{ n=0}^{N-1}h(n)x(n)\,e^{-j\,\left (2\pi /N\right )nrk'}\;, }$$
(8.128)

where k′ goes from 0 to M − 1. We can rewrite Eq. (8.128) as a double summation over r subsegments, each of length M, as

$$\displaystyle{ X(k') =\sum _{ m=0}^{r-1}\sum _{ n=0}^{M-1}h(n + mM)\,x(n + mM)\,e^{-j\,\left (2\pi /N\right )(n+mM)rk'}\;. }$$
(8.129)

Notice that

$$\displaystyle{ e^{-j\,\left (2\pi /N\right )(n+mM)rk'} = e^{-j\,\left (2\pi nk'/M\right )}e^{-j2\pi mk'}\;. }$$
(8.130)

The rightmost exponential factor is unity. Hence,

$$\displaystyle{ X(k') =\sum _{ m=0}^{r-1}\sum _{ n=0}^{M-1}h(n + mM)\,x(n + mM)\,e^{-j\,\left (2\pi /M\right )nk'}\;. }$$
(8.131)

In Eq. (8.131), there are r DFTs of length M, and in Eq. (8.128), there is one DFT of length N = r M, so there is only a slight reduction in the workload, approximated by the number of multiplications required. Note that the FFT algorithm, which has a workload proportional to Mlog2 M, is used for the DFT calculation.

The kernel of the exponential in Eq. (8.131) does not contain r, so we can interchange the order of summation and rewrite it as

$$\displaystyle{ X(k') =\sum _{ n=0}^{M-1}\left [\sum _{ m=0}^{r-1}h(n + mM)\,x(n + mM)\right ]e^{-j\,\left (2\pi /M\right )nk'}\;. }$$
(8.132)

This step reduces the calculation from r DFTs of length M to one DFT of length M. The workload for applying the window function h(n) remains proportional to N. Hence, the workload for Eq. (8.128) is N + Nlog2 N, while the workload for Eq. (8.131) is N + Mlog2 M. The workload is thus reduced by a factor of \(\mathcal{R}\), given by

$$\displaystyle{ \mathcal{R} = \frac{N + N\log _{2}N} {N + M\log _{2}M} = \frac{1 +\log _{2}N} {1 + \frac{1} {r}\left (\log _{2}N -\log _{2}r\right )} \simeq r\;, }$$
(8.133)

where the approximation holds for N ≫ 1.

After the calculation in Eq. (8.132) is performed, the N-point window is moved by M steps, and the process is repeated. Each segment of M points is thus processed r times. Therefore, the input and output data rates are the same, except when spectral values at negative frequencies are discarded.

The calculation in Eq. (8.132) is expressed diagramatically in Fig. 8.24. This process may seem counterintuitive in the following sense. The data stream is severely decimated by the action of the commutator, which distributes the time samples among the branches, or “partitions,” with a cycling period M. That is, the data samples into each of the M partitions are

Fig. 8.24
figure 24

A diagram of a polyphase filter bank, which converts a set of N data samples into an M-point spectrum. The input data stream is distributed among the M filter partitions by a commutator. Each partition receives a data stream that has been downsampled by a factor M. In each partition, P M represents the action of the decimated version of h(n), as described by the term in brackets in Eq. (8.132). The nonaliased M-point spectrum is assembled by the action of the FFT. Note that if the data samples are real numbers, then only M∕2 values of the spectrum, corresponding to the positive frequencies, need be retained.

$$\displaystyle{ \begin{array}{*{10}c} x(0), & x(M), & \ x(2M), &\ \cdots \,,& \ x(rM - M) \\ x(1), &x(M + 1),&\ x(M + 2),&\ \cdots \,,&\ x(rM - M + 1) \\ x(2), &x(M + 2),&\ x(M + 3),&\ \cdots \,,&\ x(rM - M + 2)\\ \ \vdots \\ x(M - 1),& x(M), &\ x(M + 1),&\ \cdots \,,& \ x(rM - 1)\;.\\ \end{array} }$$
(8.134)

Each of these decimated data streams is undersampled by a factor of M, and its corresponding spectrum is heavily aliased. The action of the PFB undoes this aliasing.

Consider an example where N = 1024, M = 256, and r = 4 (a four-tap polyphase filter), as shown in Fig. 8.25. The first polyphase partition, P 0, calculates only the four-term sum x(0)h(0) + x(256)h(256) + x(512)h(512) + x(768)h(768), and P 1 calculates x(1)h(1) + x(257)h(257) + x(513)h(513) + x(769)h(769).

Fig. 8.25
figure 25

A graphical representation of the action of a polyphase filter bank with r = 4, or four taps. A random noise data stream represented by a set of N independent Gaussianly distributed random noise (i.e., white noise) is shown in the top panel. It is multiplied by a window function h(n), the envelope of which is shown in the next panel. Here, h(n) is chosen to be a sinc functions with exactly four zero crossings, equal to the number of taps. The result is separated into four segments, which are coadded to form the M-term time series shown in the lowest panel, which is then Fourier transformed into an M-point spectrum, −Δ ν to Δ ν, as formulated by Eq. (8.132). After this calculation, the window is moved by M samples, and the process repeated. Adapted from Gary (2014).

We can now compare the performance and requirements of the SBFT and the PFB. The SBFT produces an M-point spectrum for each M data samples. It moves successively from block to block, so the data rate remains the same. The PFB takes in N data samples and produces an M-point spectrum and then steps by M samples for the next spectral calculation. Hence, its data rate also remains the same. The overhead in the PFB is due to the windowing. Hence, the workload ratio \(\mathcal{R}\) needed for the PFB with respect to the SBFT is

$$\displaystyle{ \mathcal{R} = \frac{N + M\log _{2}M} {M\log _{2}M} = 1 + \frac{r} {\log _{2}M}\;. }$$
(8.135)

For M = 1024 and r = 4, there is a 40% overhead incurred with the PFB structure. The flat response on low leakage from the PFB is made possible because there are N samples available to provide the filter action rather than M. Note that in hardware implementation, the buffering requirement increases with r.

It is advantageous to apply additional weighting to h(n) such as Hann, Hamming, or Blackman weighting to further reduce spectral leakage. This does not reduce the resolution significantly as long as the weighting function remains at a level of ∼ 1 over M samples. Examples of PFB and SBFT filter shapes are shown in Fig. 8.26. If the weighting is applied in the SBFT mode over M samples, the leakage is reduced, but the resolution is also reduced.

Fig. 8.26
figure 26

The thick line shows the response of a filter element in a PFB having r = 8 and Hann weighting applied. The thin line shows the response for a SBFT, a sinc2 function. Both filters have a response of about (2∕π)2 at the filter edge of ± 0. 5 in normalized frequency units of 2Δ νM.

Note that PFBs can be concatenated. The output of any subset or all of the channels of a PFB can be fed into an additional PFB to obtain finer resolution. The Murchison Widefield Array uses such a scheme. Another application is to use a PFB only for course channelization. Its output can then be fed to an XF or FX correlator.

8.8.10 Software Correlators

Since, in practice, the signals for which cross-correlations are formed are in digital form, having also been subjected to a digital delay system, the cross multiplication and averaging processes can be carried out in a computer system. This is useful in small systems for which the development of special correlator hardware is avoided. Also, in the case of large systems in which antennas are brought into operation over a period of years, changes in the correlation requirements are more easily accommodated. An example of a software correlator and the advantages of the design are described by Deller et al. (2007). Most VLBI processing is done in software correlators.