Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter is concerned with the calibration and Fourier transformation of visibility data, mainly as applied to Earth-rotation synthesis. Methods for the evaluation of the visibility measurements on a rectangular grid of points, necessary for the use of the discrete Fourier transform as implemented with the fast Fourier transform (FFT) algorithm, are discussed. Phase and amplitude closure conditions, which are valuable calibration tools, are also described. Analysis of the causes of certain types of image defects is given. Special consideration is given for certain observing modes, such as spectral line, and conversion of frequency to velocity is described. In addition, methods of extracting astronomical information directly from visibility data by model fitting are described. These techniques are important even with arrays having excellent (u, v) coverage. Some methods of calculating Fourier transforms before the advent of the FFT are discussed in Appendix Appendix 10.3.

10.1 Calibration of the Visibility

The purpose of calibration is to remove, insofar as possible, the effects of instrumental and atmospheric factors in the measurements. Such factors depend largely on the individual antennas or antenna pairs and their associated electronics, so correction must be applied to the visibility data before they are combined into an image. Editing the visibility data to delete any that show evidence of radio interference or equipment malfunction is usually performed before the full calibration process. This largely entails examining samples of data for unexpected amplitude or phase variations. Data taken on unresolved calibration sources are particularly useful here, since the response to such a source is predictable and should vary only slowly and smoothly with time.

In the calibration procedure, we first consider instrumental factors that are stable with time over periods of weeks or more. These include:

  1. 1.

    antenna position coordinates that specify the baselines,

  2. 2.

    antenna pointing corrections resulting from axis misalignments or other mechanical tolerances,

  3. 3.

    zero-point settings of the instrumental delays, that is, the settings for which the delays from the antennas to the correlator inputs are equal.

These parameters vary only as a result of major changes such as the relocation of an antenna. They can be calibrated by observing unresolved sources with known positions (see Sect. 12.2). We assume here that they have been determined in advance of the imaging observations. We also assume that correction for the nonlinearity of signal quantization, which is discussed in Sect. 8.4, has been applied if required.

10.1.1 Corrections for Calculable or Directly Monitored Effects

Calibration of the visibility measurements for effects that vary during an observation principally involves correction of the complex gains of the antenna pairs. Such factors can be divided into those for which the behavior can be predicted or directly measured and those for which it must be determined by observing a calibration source during the observation period. Examples of effects that can be corrected for by calculation of their effects include:

  1. 1.

    the constant component of atmospheric attenuation as a function of zenith angle (see Sect. 13.1.3),

  2. 2.

    variation of antenna gain as a function of elevation caused by elastic deformation of the structure under gravity. This may be based on pointing observations as well as structural calculations.

Shadowing, in which one antenna partially blocks the aperture of another, can occur at close spacings and low elevation angles. In principle, it is a problem that should be calibratable, since the positions and structures of the antennas are known. However, the effect of the geometrical blockage is complicated by diffraction, the shape of the primary beam is modified, and the position of the phase center of the aperture is shifted, thus affecting the baseline. Overall, these effects are often too complicated to be analyzed, and data from shadowed antennas are often discarded.

Effects within the receiving system, or external to it, that can be continuously monitored during an observation include:

  1. 1.

    variation of system noise temperature, which can result from changes in the ground radiation picked up in the sidelobes as the antenna tracks or from changes in atmospheric opacity. This effect may also cause variation in the gain as a result of automatic level control (ALC) action that is used in some instruments to adjust the signal levels at the sampler or correlator (see Sect. 7.6). Monitoring can be performed by injection of a low-level, switched, noise signal at the receiver input and detection of it later in the system.

  2. 2.

    phase variations in the local oscillator system monitored by round-trip phase measurement (see Sect. 7.2),

  3. 3.

    the variable component of atmospheric delay monitored by using water vapor radiometers mounted at the antennas (see Sect. 13.3).

Corrections for these effects are usually performed at an early stage of the calibration procedure.

10.1.2 Use of Calibration Sources

Further steps in the calibration involve parameters that may vary on timescales of minutes or hours and require the observation of one or more calibration sources. Note that the source that is the subject of the astronomical investigation will be referred to as the target source to distinguish it from the calibration source, or calibrator. From Eq. (3.9), we can write the small-field expression for the interferometer response as follows:

$$\displaystyle{ [\mathcal{V}(u,v)]_{\mathrm{uncal}} = G_{mn}(t)\int _{-\infty }^{\infty }\int _{ -\infty }^{\infty }\frac{A_{N}(l,m)I(l,m)} {\sqrt{1 - l^{2 } - m^{2}}} e^{-j2\pi (ul+vm)}dl\,dm\;, }$$
(10.1)

where \([\mathcal{V}(u,v)]_{\mathrm{uncal}}\) is the uncalibrated visibility, and I(l, m) is the source intensity. The complex gain factor G mn (t) is a function of the antenna pair (m, n) and, as a result of unwanted effects, may vary with time. A N  is the antenna aperture normalized to unity for the direction of the main beam. It can be removed from the source image as a final step in the image processing. The factor \(A_{N}(l,m)/\sqrt{1 - l^{2 } - m^{2}}\) is close to unity, and from here on, we generally omit it, except in the case of wide-field imaging. To calibrate G mn (t), an unresolved calibrator can be observed, for which the measured response is

$$\displaystyle{ \mathcal{V}_{c}(u,v) = G_{mn}(t)S_{c}\;, }$$
(10.2)

where the subscript c indicates a calibrator, and S c is the flux density of the calibrator. In calibrating the gain, it is best to consider the amplitude and phase separately, since the errors in these two quantities generally arise through different mechanisms. For example, atmospheric fluctuations due to tropospheric inhomogeneity cause phase fluctuations but have little effect on the amplitudes. To calibrate the visibility of the target source, we can write

$$\displaystyle{ \mathcal{V}(u,v) = \frac{[\mathcal{V}(u,v)]_{\mathrm{uncal}}} {G_{mn}(t)} = [\mathcal{V}(u,v)]_{\mathrm{uncal}}\left [\frac{S_{c}} {\mathcal{V}_{c}}\right ]\;. }$$
(10.3)

To observe the calibration source, it is usually placed at the phase center of its field. Then assuming that the calibrator is unresolved, the phase is a direct measure of the instrumental phase. Thus, phase calibration for the target source requires subtracting the calibrator phase from the observed phase. The visibility amplitude can be calibrated by using the moduli of the visibility terms in Eq. (10.3). The response to the calibrator should be corrected for the calculable and/or directly monitored effects before the gain calibration is performed. Where there are separate receiving channels for two opposite polarizations at each antenna, the calibration should be performed separately for each one. For measurements of source polarization, further calibration procedures are necessary, as described in Sect. 4.7.5

Calibration observations require periodic interruption of observations of the target source. At centimeter wavelengths, the interval between calibration observations depends on the stability of the instrument and typically falls within the range of 15 min to 1 h. At meter and centimeter wavelengths, the ionosphere and the neutral atmosphere introduce gain and phase changes, and elimination of these may require observation of a calibrator at time intervals as short as a few minutes. At millimeter and submillimeter wavelengths, calibration at time intervals less than a minute is usually required.

As indicated by Eq. (7.38), G mn  = g m g n , so the measured gains for antenna pairs can be used to determine gain factors for the individual antennas. Using the individual antenna gain factors rather than the baseline gain factors reduces the calibration data to be stored and helps in monitoring the performance of individual antennas. Also, with this technique, some of the spacings can be omitted from the calibration observation so long as each of the antennas is included. In practice, gain tables including both amplitude and phase are generated for the antennas as a function of time, and the values are interpolated to the times at which data from the target source were taken. The interpolation should be done separately for the amplitude and phase, not for the real and imaginary parts of the gain; otherwise, the phase errors can degrade the amplitude, and vice versa. The desirable characteristics of a calibration source are the following.

Flux density. :

The calibrator should be strong, so that a good signal-to-noise ratio is obtained in a short time, to reduce the (u, v) coverage lost from the target source. The gaps in the (u, v) coverage are more serious for a linear array, in which complete sectors are lost, than for a two-dimensional array, in which the instantaneous coverage is more widely distributed in u and v.

Angular width. :

The calibrator should, if possible, be unresolved so that precise details of its visibility are not required.

Position. :

The position of the calibrator should be close to that of the target source. Effects in the atmosphere or antennas that cause the gain to vary with pointing angle are then more effectively removed, and time lost in driving the antennas between the target source and calibrator positions is kept small. At millimeter wavelengths, where the atmospheric phase path is the main factor being calibrated, the calibrator distance must be within the angular scale of the irregularities. This usually means a distance of no more than a few degrees on the sky (see Sect. 13.4).

It is not always possible to find a calibrator that satisfies all of the above requirements. In such cases, it may be necessary to find a source that is largely unresolved and close to the target source and then calibrate it against one of the more commonly used flux density references such as 3C48, 3C147, 3C286, and 3C295. The last of these is the most reliable with regard to nonvariability. Thermal sources such as the compact planetary nebula NGC7027 may be useful as amplitude calibrators for short baselines. At millimeter wavelengths, it may be more difficult to find a source that provides a strong signal for test purposes or calibration. Disks of planets become resolved at rather short baselines, but the limb of the Moon or a planet can be useful: see Appendix Appendix 10.1.

The use of clusters of small sources as calibrators has been investigated by Kazemi et al. (2013). Such clusters might typically consist of two to ten sources of small angular diameter, and flux densities are correspondingly lower than required for single calibration sources. This approach allows calibrators to be found closer to the object under investigation and thus potentially increases the number available as well as reducing errors related to angular distance.

For VLBI observations with milliarcsecond resolution, there are fewer suitable calibrators. Angular structure on this scale is sometimes variable over periods of months, and caution is necessary if a previously measured and partially resolved source is to be used as a calibrator. An alternative approach to amplitude calibration of VLBI data involves use of the system temperatures and collecting areas of the individual antennas, as follows. The cross-correlation data should first be normalized to unity for the case in which the two input data streams are fully correlated. To obtain this normalization, the data are divided by the product of the rms values of the data streams at the two correlator inputs. (For two-level sampling, this rms value is unity, and for other types of sampling, the rms depends on the setting of the sampler thresholds with respect to the level of the analog signal.) Then, to convert the normalized correlation to visibility \(\mathcal{V}\) with units of flux density (janskys), the amplitude is multiplied by the geometric mean of the system equivalent flux density (SEFD) values for the two antennas involved. The system equivalent flux density, SEFD = 2kT S A, is defined in Eq. (1.7). If the value of T S corresponds to a signal plane above the atmosphere, then the resulting visibility values will be corrected for atmospheric losses. For VLBI data in which the phase may sometimes not be calibrated, the closure relationships in Sect. 10.3 allow images to be formed if absolute position is not required.

10.2 Derivation of Intensity from Visibility

10.2.1 Imaging by Direct Fourier Transformation

A straightforward method of obtaining an estimate of the intensity distribution from measured visibility data is by direct Fourier transformation, that is, by performing the transformation without putting the visibility into any special form such as interpolating it onto a uniform grid. The measured visibility \(\mathcal{V}_{\mathrm{meas}}(u,v)\) can be written

$$\displaystyle{ \mathcal{V}_{\mathrm{meas}}(u,v) = W(u,v)w(u,v)\mathcal{V}(u,v)\;, }$$
(10.4)

where W(u, v) is the transfer function or spatial sensitivity function introduced in Sect. 5.3, and w(u, v) represents any applied weighting. The Fourier transform of Eq. (10.4) is the measured intensity distribution (i.e., the image), which is

$$\displaystyle{ I_{\mathrm{meas}}(l,m) = I(l,m) {\ast}{\ast}\,b_{0}(l,m)\;, }$$
(10.5)

where the double asterisk indicates two-dimensional convolution, and b 0 is the synthesized beam, which is the Fourier transform of the weighted transfer function:

$$\displaystyle{ b_{0}(l,m)\longleftrightarrow W(u,v)w(u,v)\;, }$$
(10.6)

where ↔ indicates the Fourier transform relationship. Effects such as those of noncoplanar baselines, signal bandwidth, and visibility averaging are not included here. b 0(l, m) is also known as the point-source response function or the dirty beam, in the context of the CLEAN deconvolution algorithm, which is discussed in Sect. 11.1

The visibility is measured at an ensemble of n d points in the (u, v) plane. If the antennas are identically polarized and the source is unpolarized, the direct Fourier transform of these data is represented by

$$\displaystyle{ I_{\mathrm{meas}}(l,m) =\sum _{ i=1}^{n_{d} }w_{i}\left [\mathcal{V}_{\mathrm{meas}}(u_{i},v_{i})e^{\,j2\pi (u_{i}l+v_{i}m)} + \mathcal{V}_{\mathrm{ meas}}(-u_{i},-v_{i})e^{-j2\pi (u_{i}l+v_{i}m)}\right ]\;. }$$
(10.7)

The fundamental issue in image synthesis is whether we can recover I(l, m) from I meas(l, m). In principle, Eq. (10.4) can be used to determine \(\mathcal{V}(u,v)\) as \(\mathcal{V}_{\mathrm{meas}}(u,v)/W(u,v)w(u,v)\). The image can be calculated exactly if W(u, v)w(u, v) is everywhere nonzero.

Bracewell and Roberts (1954) pointed out that, in principle, there are an infinite number of solutions to the convolution in Eq. (10.5), since one can add any arbitrary visibility values in the unsampled areas of the (u, v) plane. The Fourier transform of these added values constitutes an invisible distribution that cannot be detected by any instrument with corresponding zero areas in the transfer function. It may be argued that in interpreting observations from any radio telescope, one should maintain only zeros in the unmeasured regions of spectral sensitivity, to avoid arbitrarily generating information. On the other hand, the zeros are themselves arbitrary values, some of which are certainly wrong. What is wanted is a procedure that allows the visibility at the unmeasured points to take values consistent with the most reasonable or likely intensity distribution, while minimizing the addition of arbitrary detail. Positivity of intensity and limitation of size of the angular structure of a source are expected characteristics that can be introduced into the imaging process. Image restoration techniques that implicitly generate nonzero visibility values at unmeasured (u, v) points include CLEAN, maximum entropy, and compressed sensing, which are discussed in Chap. 11

10.2.2 Weighting of the Visibility Data

To obtain the best signal-to-noise ratio in the summation of measurements that contain Gaussian noise, the individual data values should be weighted inversely as their variances. The same is true for the combination of sinusoidal components in an image of a source, the amplitudes of which are proportional to the corresponding visibility points. Thus, for the best signal-to-noise ratio, the weights w i in Eq (10.7) should be inversely proportional to the variances. If the data are obtained with a uniform array of antennas and receivers, and the averaging time is the same for all data points, then the variances should all be the same, and maximum signal-to-noise ratio is obtained by including all measurements with the same weight. This is known as natural weighting. For many arrays, natural weighting results in a poor beam shape with wide skirts because the shorter spacings are overemphasized. Thus, the usual approach is to include in the weighting a factor that is inversely related to the area density of the data in the (u, v) plane. The area density ρ σ (u, v) can be defined such that the number of points in the range \(u \pm \frac{1} {2}du,\,v \pm \frac{1} {2}dv\) is ρ σ (u, v)dudv (Thompson and Bracewell 1974). Although ρ σ at any given point depends on the size of the increments du and dv, it is usually possible to specify the variation of relative density and correct for it satisfactorily. As a simple example, in the observation of a high-declination source with an east–west array in which the antenna spacings are nonredundant integral multiples of a unit value, the visibility points lie on concentric circles, as in Fig. 10.1. Then, if the visibility is measured at uniform increments in hour angle, the area density at any ring is inversely proportional to the radius of the ring. With w(u, v) proportional to 1∕ρ σ (u, v), the effective density of the data is uniform within a circle of radius u max determined by the maximum spacing. The beam then closely approximates the Fourier transform of a circular disk function, which, normalized to unity at the maximum, is given by

$$\displaystyle{ \frac{J_{1}(2\pi lu_{\mathrm{max}})} {\pi lu_{\mathrm{max}}} \;, }$$
(10.8)

where J 1 is the Bessel function of the first kind and first order. 2J 1(x)∕x is called a jinc function, by analogy to a sinc function. The full width of the beam at half-maximum (FWHM) is 0.705 u max −1, and the first sidelobe response is 13.2% of the main beam.Footnote 1 Similarly, if the effective density of measurements is uniform within a rectangular area of dimensions 2u max × v max, the synthesized beam is closely approximated by

$$\displaystyle{ \frac{\sin (2\pi u_{\mathrm{max}}l)} {2\pi u_{\mathrm{max}}l} \times \frac{\sin (2\pi v_{\mathrm{max}}m)} {2\pi v_{\mathrm{max}}m} \;. }$$
(10.9)

This beam is not circularly symmetrical, and the first sidelobe has a maximum value of 22% in the east–west and north–south directions through the beam center.

Fig. 10.1
figure 1

Transfer function (spacing loci) in the (u, v) plane for observations of a high-declination source using an east–west array with uniform increments in antenna spacing. The points indicate visibility measurements, and their (u, v) positions reflected through the origin, for uniform intervals of time. The angle ϕ indicates data for a specific hour angle. If the visibility values are weighted in proportion to the radii of the loci, the density of the visibility data is effectively uniform out to a radius u max.

With uniform weighting, the strong, near-in sidelobes (close to the main beam) in Fig. 10.2 obscure low-level detail and thereby reduce the range of intensity levels that can be reliably measured. The near-in sidelobes of the functions in expressions (10.8) and (10.9) can be reduced at the expense of some increase in the width of the synthesized beam by introducing a Gaussian or similar taper into the weighting function. The effect of such tapering of the visibility is shown in Fig. 10.2. The taper can be specified in terms of the amplitude of the tapering function at a distance u max from the (u, v) origin; a taper to ∼ −13 dB of the central value is commonly used. With such a taper, the weighting w(u, v) is the product of two functions: w u (u, v), the weighting required to obtain uniform effective density, and w t (u, v), the tapering function. Thus, the synthesized beam is the Fourier transform of W(u, v)w u (u, v)w t (u, v):

Fig. 10.2
figure 2

Examples of synthesized beam profiles. Curves for no taper correspond to a visibility distribution that is uniform within (a ) a rectangular area of width 2u max, and (b ) a circular area of diameter 2u max. For no taper, the responses correspond to expression (10.9) for (a ) and (10.8) for (b ). The effects of Gaussian tapers that reduce the visibility at the edge of the distribution to 30% and to 10% are also shown. Note the difference in the ordinate scales.

$$\displaystyle{ b_{0}(l,m) = \overline{W}(l,m) {\ast}{\ast}\,\overline{w}_{u}(l,m) {\ast}{\ast}\,\overline{w}_{t}(l,m)\;, }$$
(10.10)

where the bar denotes a Fourier transform. The Fourier transform of W(u, v)w u (u, v) is simply the beam obtained with uniform effective density, for example, as in expressions (10.8) or (10.9). If w t (u, v) is a two-dimensional Gaussian function, its Fourier transform is also a Gaussian. Thus, the sidelobe reduction results from convolution with a Gaussian in the (l, m) domain. The variances of functions are additive under convolution [see, e.g., Bracewell (2000)], so the beam obtained by convolution with \(\overline{w}_{t}\) is broader than that with no tapering, as is evident in Fig. 10.2.

An interesting property of the uniform weighting is that it minimizes the mean-squared deviation of the resulting intensity from the true intensity, within the constraint that unmeasured visibility values remain zero. This can be understood as follows. Since the true intensity distribution I(l, m) and the true visibility function \(\mathcal{V}(u,v)\) are a Fourier pair, and the weighted measured visibility and the derived intensity I 0(l, m) are a Fourier pair, it follows that the differences between these quantities in the two domains are also a Fourier pair, to which we can apply Parseval’s theorem. Recall that W(u, v) is the transfer function, w u (u, v) is the weighting required to obtain effective uniform density of data in the (u, v) plane, and w t (u, v) is an applied taper. Thus, we can write

$$\displaystyle\begin{array}{rcl} & & \int \int _{\mathrm{meas}}\left \vert \mathcal{V}(u,v) -\mathcal{V}(u,v)W(u,v)w_{u}(u,v)w_{t}(u,v)\right \vert ^{2}du\,dv \\ & & \quad +\int \int _{\mathrm{unmeas}}\left \vert \mathcal{V}(u,v)\right \vert ^{2}du\,dv \\ & & \quad =\int _{ -\infty }^{\infty }\int _{ -\infty }^{\infty }\left \vert I(l,m) - I_{ 0}(l,m)\right \vert ^{2}dl\,dm\;. {}\end{array}$$
(10.11)

The first and second lines of Eq. (10.11) represent the measured and unmeasured areas of the (u, v) plane, respectively. In the measured area, W(u, v)w u (u, v) = 1. For the case of uniform weighting, w t  = 1, so the integral on the first line is zero. This condition minimizes the squared difference between the true and observed intensity distributions on the third line. If I(l, m) is an unresolved point source, then I 0(l, m) is equal to the synthesized beam. The uniform weighting minimizes the squared difference, over 4π steradians, between the synthesized beam and the response to a point source as it would be observed with unlimited (u, v) coverage. In this sense, it is sometimes said that uniform weighting minimizes the sidelobes of the synthesized beam. However, as shown in Fig. 10.2, a Gaussian taper reduces the sidelobes outside of the main beam at the expense of widening the beam. Images derived from visibility data that are uniformly weighted within the measured area of the (u, v) plane have been referred to as the principal solution or principal response (Bracewell and Roberts 1954). The related process of reducing the sidelobe response in optical imaging is called apodization, for which there is an extensive literature; see, for example, Jacquinot and Roizen-Dossier (1964) and Slepian (1965).

10.2.2.1 Robust Weighting

With large arrays, the visibility data must be interpolated onto a uniform grid as described in Sect. 5.2 in order to make computations tractable. The simplest approach is called cell averaging, where each data point is associated with the (u, v) grid point nearest to it. The number of points averaged in a cell will decrease with increasing (u, v) distance, and many cells will have zero entries. Thus, the variance of the visibility estimates will vary considerably over the (u, v) plane. A conflict arises between the goal of forming a synthesized beam that is narrow and has low sidelobes and achieving the optimum sensitivity for the detection of weak sources. The best strategy for detecting a weak point source in the field is to use natural weighting, i.e., performing the image transform with variance weighting. On the other hand, if the signal-to-noise ratio is high, an image with better resolution and lower sidelobes can be obtained with uniform weighting.

Briggs (1995) introduced a logarithmic parametrized scheme that allows a continuous variation in weighting between uniform and variance weighting. The process is called robust weighting. The weighting of cell (i, k) in the (u, v) plane whose visibility has an rms error of σ ik is specified as

$$\displaystyle{ w_{ik} = \frac{1} {S^{2} +\sigma _{ \!ik}^{2}}\:, }$$
(10.12)

where S is a parameter defined by

$$\displaystyle{ S^{2} = \frac{(5 \times 10^{-R})^{2}} {\overline{w}} \;. }$$
(10.13)

R is the robustness factor, and \(\overline{w}\) is the average variance weighting factor over the n c cells in the image,

$$\displaystyle{ \overline{w} = \frac{1} {n_{c}}\sum \frac{1} {\sigma _{\!ik}^{2}}\;. }$$
(10.14)

The nominal range of R is –2 to 2. R = 2 makes S very small with respect to w so that the weighting approaches natural weighting, whereas R = −2 makes S large with respect to w so that the weighting approaches the uniform weighting. R = 0 produces an rms that is midway between the values for R = −2 and 2. R is called the robustness factor because as it increases, the image is more immune to errors in calibration or errors due to radio frequency interference, because the effect of a bad point in a cell with few data points is deemphasized as R increases. An example of how the synthesized beamwidth and rms noise vary with R is shown in Fig. 10.3. In the vicinity of R = 0, which is the normal default value, the beamwidth and rms noise are most sensitive to changes in R. For the example shown in Fig. 10.3, the beamwidth increases by 5%, and the rms noise decreases by 45% as R increases from –0.5 to 0.5. For inhomogeneous arrays such as those used in VLBI, the gain in sensitivity can increase markedly for little increase in beamwidth.

Fig. 10.3
figure 3

Synthesized beamwidth vs. normalized rms noise level in an image for robustness factor R ranging from −2 to 2. The calculations are for the source 1987A (Dec. = −69) observed with two tracks of the Australia Telescope (configurations 6A and 6C) of about 7-h duration each. Adapted from Briggs (1995).

10.2.3 Imaging by Discrete Fourier Transformation

The speed of the fast algorithm for the discrete Fourier transform (FFT), briefly discussed in Sect. 5.2, is a major advantage in computing large images. However, the use of the FFT introduces two complications in addition to those discussed for the direct transform: (1) the necessity to evaluate the visibility at points on a rectangular grid and (2) the resulting possibility of aliasing of parts of the image from outside the synthesized field. The evaluation at the grid points is often referred to as gridding. The output of such a process can be represented by the following expression:

$$\displaystyle{ \frac{w(u,v)} {\varDelta u\varDelta v} \,^{2}\mathrm{III}\left (\frac{u} {\varDelta u}, \frac{v} {\varDelta v}\right )\left \{C(u,v) {\ast}{\ast}\left [W(u,v)\mathcal{V}(u,v)\right ]\right \}\;. }$$
(10.15)

Here the visibility \(\mathcal{V}(u,v)\), measured at the points denoted by the transfer function W(u, v), is convolved with a function C(u, v) to produce a continuous visibility distribution. This is then resampled at points in a rectangular grid with incremental spacings Δ u and Δ v. This process is sometimes referred to as convolutional gridding. The resampling is here represented by the two-dimensional shah function 2III (Bracewell 1956b), defined by

$$\displaystyle{ ^{2}\mathrm{III}\left (\frac{u} {\varDelta u}, \frac{v} {\varDelta v}\right ) =\varDelta u\varDelta v\sum \limits _{i=-\infty }^{\infty }\sum _{ k=-\infty }^{\infty }{}^{2}\delta (u - i\varDelta u,\,v - k\varDelta v)\;, }$$
(10.16)

where2 δ is the two-dimensional delta function. The weighting to optimize the beam is applied to the resampled data. Although this process is described mathematically in terms of convolution and resampling, in practice the convolution is evaluated only at the grid points. The Fourier transform of (10.15) represents the measured intensity:

$$\displaystyle\begin{array}{rcl} I_{\mathrm{meas}}(l,m) = ^{2}\mathrm{III}(l\varDelta u,m\varDelta v) {\ast}{\ast}\,\overline{w}(l,m) {\ast}{\ast}\left \{\overline{C}(l,m)\left [\,\overline{W}(l,m) {\ast}{\ast}\,I(l,m)\right ]\right \}\;.& &{}\end{array}$$
(10.17)

As a result of the Fourier transformation, the intensity function I(l, m) is convolved with the Fourier transform of the transfer function; multiplied by \(\overline{C}(l,m)\), which is the Fourier transform of the convolving function; and then convolved with the Fourier transforms of the weighting and resampling functions. This last convolution causes the whole image to be replicated at intervals Δ u −1 in l and Δ v −1 in m. These intervals are equal to the dimensions of the image in the (l, m) plane; that is, Δ u −1 = M Δ l and Δ v −1 = N Δ m, for an M × N point array. The function \(\overline{C}(l,m)\) takes the form of a taper applied to the image, and if this function does not vary greatly on the scale of the width of \(\overline{w}(l,m)\), which is usually the case for large images, then \(\overline{w}(l,m)\) in Eq. (10.17) can be convolved directly with \(\overline{W}(l,m) {\ast}{\ast}I(l,m)\), and Eq. (10.17) becomes

$$\displaystyle{ I_{\mathrm{meas}}(l,m) \simeq ^{2}\mathrm{III}(l\,\varDelta u,m\,\varDelta v) {\ast}{\ast}\left \{\overline{C}(l,m)\left [I(l,m) {\ast}{\ast}\,b_{ 0}(l,m)\right ]\right \}\;, }$$
(10.18)

where the synthesized beam b 0(l, m) enters through the relationship in Eq. (10.6). Comparison with Eq. (10.5) shows that the effect of the gridding and resampling is to multiply the image by \(\overline{C}(l,m)\) and replicate it. This replication introduces the aliasing.

Returning to the estimation of the visibility at the grid points, we might perhaps expect the best technique to be some form of exact interpolation so that the resulting values are equal to those that would be obtained by measurement at the grid points. A method of this type has been described by Thompson and Bracewell (1974). However, the problem of aliasing remains, and the most effective way to deal with this is to convolve the data in the (u, v) plane with the Fourier transform of a function that, in the (l, m) plane, varies very little over the image and then falls off rapidly at the image edges. We therefore look for a convolving function C(u, v) for which the Fourier transform \(\overline{C}(l,m)\) has these properties. An ideal function with infinitely sharp cutoff at the field edges would completely eliminate the aliasing since there would be no overlap of the replicated images. Unfortunately, this ideal is not practical because the required convolving function is not bounded in the (u, v) plane. Nevertheless, a very worthwhile degree of suppression of the aliasing is possible with a careful choice of functions. A common and convenient practice is to combine both the gridding, and the convolution to minimize aliasing, into a single operation. Note, however, that at the (u, v) points at which the measurements are made, the function \(C(u,v) {\ast}{\ast}\left [W(u,v)\mathcal{V}(u,v)\right ]\), in general, is not equal to the measured visibility \(\mathcal{V}(u,v)\). Thus, the gridding process cannot precisely be described as interpolation. Also, because of the convolution, the sampled points represent averages of the visibility local to the grid points, rather than samples of the visibility function. Finally, note also that although convolution is effective in suppressing artifacts that result from gridding of the data, it does not reduce sidelobe or ringlobe responses to sources located outside the area of the image.

10.2.4 Convolving Functions and Aliasing

From the foregoing discussion, we can conclude that the point of principal concern in the use of the FFT is the choice of convolving function. A detailed discussion of convolving functions is given by Schwab (1984). It is convenient to consider those that are separable into one-dimensional functions of the same form for u and v, that is,

$$\displaystyle{ C(u,v) = C_{1}(u)C_{1}(v)\;. }$$
(10.19)

We therefore discuss some examples of the function C 1.

Rectangular Function. This function is the one used in cell averaging discussed in Sect. 5.2.2. It can be written

$$\displaystyle{ C_{1}(u) = (\varDelta u)^{-1}\Pi \left (\frac{u} {\varDelta u}\right )\;, }$$
(10.20)

where Π is the unit rectangle function defined by

$$\displaystyle{ \Pi (x) = \left \{\begin{array}{ll} 1,&\qquad \vert x\vert \leq \frac{1} {2} \\ 0,&\qquad \vert x\vert > \frac{1} {2}\;. \end{array} \right. }$$
(10.21)

The Fourier transform of C 1(u) is

$$\displaystyle{ \overline{C}_{1}(l) = \frac{\sin (\pi \varDelta ul)} {\pi \varDelta ul} \;. }$$
(10.22)

At the edge of the synthesized field, l = (2Δ u)−1 and \(\overline{C}_{1}(1/2\varDelta u) = 2/\pi\). The image is tapered by a sinc-function profile in the l and m directions and a sinc-squared profile along the diagonals. Equation (10.22) is plotted in Fig. 10.4, and the value at the first maximum outside the edge of the image is 0.22 of the value at the image center. The effect of aliasing is shown more directly in Fig. 10.5a, which is a plot of \(\overline{C}_{1}(l)/\overline{C}_{1}[\,f(l)]\), where f(l) is the value of l within the image [i.e., | f(l) |  < (2Δ u)−1] at which the alias of a feature of l would appear. This quantity gives the relative response to an aliased feature in an image that has been corrected for the taper imposed by \(\overline{C}_{1}(l)\). It is clear that simple averaging of points within a rectangular cell performs poorly in suppressing aliasing.

Fig. 10.4
figure 4

Three examples of the tapering function \(\overline{C}_{1}(l)\), which is the Fourier transform of the convolving function C 1(u). For the Gaussian convolving function, α = 0. 75. For the Gaussian-sinc convolving function, α 1 = 1. 55, α 2 = 2. 52, and beyond the fourth subsidiary maximum, only the envelope of the maxima is shown. On the abscissa scale, the center of the image is at zero and the edge at 1.0. The data for the Gaussian-sinc function were computed by F. R. Schwab.

Fig. 10.5
figure 5

Logarithmic plot of the factor by which the amplitudes of structures outside the image are multiplied when aliased into the image. On the abscissa scale, 1.0 is the edge of the image and 2, 4, 6, …, are the centers of the adjacent replications. (a ) Aliasing factor for a rectangular convolving function of width equal to Δ u (cell averaging). (b ) Aliasing factor for a Gaussian-sinc convolving function with the optimized parameters given in the text. The broken line indicates the envelope of the maxima. Data computed by F. R. Schwab.

Gaussian Function. Here we have

$$\displaystyle{ C_{1}(u) = \frac{1} {\alpha \varDelta u\sqrt{\pi }}e^{-(u/\alpha \varDelta u)^{2} } }$$
(10.23)

and

$$\displaystyle{ \overline{C}_{1}(l) = e^{-(\pi \alpha \varDelta ul)^{2} }\;. }$$
(10.24)

The value of the constant α can be chosen to vary the widths of the functions as desired. If α is too small, C 1(u) will be too narrow, and only visibility measurements that are close to grid points will be used effectively in the imaging. If α is too large, the function \(\overline{C}_{1}(u)\) will taper the resulting image too severely. The Gaussian convolving function was used in the early years of the Westerbork array with \(\alpha = 2\sqrt{\ln 4}/\pi = 0.750\) (Brouw 1971). The value of the factor \(e^{-(u/\alpha \varDelta u)^{2} }\) in C 1(u) is then equal to 0.41 for a point on a diagonal in the (u, v) plane midway between two grid points. Thus, all measured points enter into the image with significant weights, and at the edge of the image, the tapering factor \(\overline{C}_{1} = \frac{1} {4}\). A curve for the Gaussian function is shown in Fig. 10.4.

Gaussian-Sinc Function. The ideal form for the image tapering function \(\overline{C}_{1}(l)\) would be a rectangle, which corresponds to convolution with a sinc function, as in Eq. (10.22). However, the envelope of a sinc function falls to zero slowly as its argument increases, and the computation required for the convolution becomes large. Truncation of the sinc function is undesirable because in the l domain, the desired rectangular function is convolved with the Fourier transform of the truncation function, and this destroys the sharp cutoff at the edges of the image. A better procedure is to multiply the sinc function with a Gaussian, which gives

$$\displaystyle{ C_{1}(u) = \frac{\sin (\pi u/\alpha _{1}\varDelta u)} {\pi u} e^{-(u/\alpha _{2}\varDelta u)^{2} } }$$
(10.25)

and

$$\displaystyle{ \overline{C}_{1}(l) = \Pi (\alpha _{1}\varDelta ul) {\ast}\left [\sqrt{\pi }\alpha _{2}\varDelta ue^{-(\pi \alpha _{2}\varDelta ul)^{2} }\right ]\;. }$$
(10.26)

Good performance is obtained with α 1 = 1. 55 and α 2 = 2. 52, with the convolution extending over an area about 6Δ u in width. Corresponding curves for \(\overline{C}_{1}(l)\) and the resulting aliasing are given in Figs. 10.4 and 10.5b. This convolving function is much better than either of the two previous examples.

Spheroidal Functions. Various other functions can be found that have the features desirable for convolution. As a measure of the effectiveness of the suppression of aliasing, (Brouw 1975) has suggested the following quantity:

$$\displaystyle{ \frac{\int \int _{\mathrm{image}}\left [\,\overline{C}(l,m)\right ]^{2}dl\,dm} {\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\left [\,\overline{C}(l,m)\right ]^{2}dl\,dm}\;, }$$
(10.27)

which shows the fraction of the integrated squared amplitude of the tapering function that falls within the image. Maximization of (10.27) provides a criterion for choosing a convolving function. This approach led to consideration of the prolate spheroidal wave functions [see, e.g., Slepian and Pollak (1961)] and the spheroidal functions (Rhodes 1970). Schwab (1984) found that among functions investigated, the latter provide the best approach to an optimum convolving function. The spheroidal functions are solutions to certain differential equations and are not expressible in simple analytic form. In applying such functions for convolution of visibility data, they are computed in advance to provide a look-up table. Comparison of some functions of this type with the Gaussian-sinc function shows that the aliasing factor \(\overline{C}_{1}(l)/\overline{C}_{1}\left [\,f(l)\right ]\) falls off about as rapidly from the center to the edge of the image, but as l increases beyond the edge of the image, it reaches values an order of magnitude or more lower than those for the Gaussian-sinc function Briggs et al. (1999). Computational capacity complicates the choice of the optimal function, since it limits the area of the (u, v) plane over which the convolution can be performed. Commonly, this area is six to eight grid cells wide and centered on the point to be interpolated. Roundoff errors in the Fourier transform are amplified in the removal of the tapering function and may limit the allowable taper at the edges of the image.

10.2.5 Aliasing and the Signal-to-Noise Ratio

Features aliased into an image from outside the boundary include not only the images of features on the sky but also the random variations resulting from the system noise. If we consider a direct Fourier transform of the noise component of the measured visibility, it is clear from Eq. (10.7) that for any point (l, m), the visibility data are weighted by complex exponential factors, all of which have the same modulus. Since the noise is independent at each data point in the (u, v) plane, the variance of the noise in the (l, m) plane is statistically constant in all parts of the image. If the FFT is used, however, the rms noise level across the image is multiplied by the function \(\overline{C}(l,m)\), and details beyond the image edge are aliased into the image. Note that the noise contributions combine additively in the variance. Thus, in one dimension, the noise variance as a function of l is proportional to

$$\displaystyle{ \mathrm{III}(l\varDelta u) {\ast}\vert C_{1}(l)\vert ^{2}\;. }$$
(10.28)

The replication resulting from the FFT can also be written in terms of a summation, and the variance of the noise at a point l within the image is then proportional to

$$\displaystyle{ \sum \limits _{i=-\infty }^{\infty }\vert \,\overline{C}_{ 1}(l + i\varDelta u^{-1})\vert ^{2}\;. }$$
(10.29)

Usually \(\overline{C}_{1}(l)\) decreases sufficiently with l that only the noise from the adjacent replication of the image makes a serious contribution through aliasing. This contribution is greatest near the edge of the image, as shown in Fig. 10.6.

Fig. 10.6
figure 6

Effect of aliasing on the variance of the noise across an image. The abscissa in each case is l in units of half the image width; the image center is at 0, the edge at 1.0, and the center of the adjacent replication at 2.0. (a ) Solid curve shows the taper for a Gaussian convolving function C 1, and dashed curves show the effect of aliasing. (b ) Variance of the noise including aliased component after correction for taper C 1. Adapted from Napier and Crane (1982) [see also Crane and Napier (1989)].

If the convolving function is the Gaussian-sinc type, we see from Fig. 10.5b that, except for values of 2Δ u l between 1.0 and 1.1, aliased features are reduced in amplitude by a factor < 10−1, and in the square of the amplitude by < 10−2. Thus, there is no significant increase in the noise level as a result of aliasing, except in a narrow zone at the edge of the image.

At the other extreme, the aliasing is most serious in the case of cell averaging, for which C 1(u) is the sinc function given by Eq. (10.22). Expression (10.29) then becomes

$$\displaystyle{ \sum \limits _{i=-\infty }^{\infty }\frac{\sin ^{2}\left [\pi (\varDelta ul + i)\right ]} {\left [\pi (\varDelta ul + i)\right ]^{2}} = 1\;, }$$
(10.30)

which indicates that the aliasing exactly cancels the taper, and the variance of the noise is constant with l, that is, before any correction for tapering of the astronomical features in the image is applied. (This result could also be deduced from the fact that in cell averaging, each visibility measurement contributes to one grid point only, and the noise components of the visibility at the grid points are therefore independent.) However, the intensity distribution of the sky within the field being imaged is tapered by the function \(\overline{C}_{1}(l)\), and correction for this taper then causes the noise to increase toward the edges of the image. For the sinc-function taper, the noise is increased by a factor of π∕2 at the edge of the image on the l and m axes and by (π∕2)2 at the corners. At the center of the image, the aliased contribution originates at points for which 2Δ u l is an even integer in the plots in Fig. 10.5, and in both cases shown, the aliasing factor \(\overline{C}_{1}(l)/\overline{C}_{1}\left [\,f(l)\right ]\) drops to a very low value. With any of the convolving functions that we have considered, there is no significant increase in the noise at the center of the image, and the signal-to-noise ratio for a source at that point is determined by the factors discussed in Sect. 6.2

10.2.6 Wide-Field Imaging

To take full advantage of large new instruments with wide bandwidths, high sensitivity, and full polarization responses, it is necessary to measure the radio sky down to the level of the background radiation from the Epoch of Reionization (EoR) and to be able to separate out components from individual radio sources that overlie the background. The width of the synthesized field may be much greater than a few degrees, so the image is no longer the Fourier transform of the visibility function. The basic requirement for such an analysis is an equation for the visibility values that would be measured for a given brightness distribution, taking account of all details of the locations and characteristics of the individual antennas, the path of the incoming radiation through the Earth’s atmosphere including the ionosphere, the atmospheric transmission, etc. This is the interferometer measurement equation introduced in Sect. 4.8 In its basic form, it describes the response of a single pair of antennas and is thus applicable to any specified system of antennas and any brightness distribution, to provide values of the visibility for each antenna pair. It includes direction-dependent effects such as the primary beam patterns of the antennas, polarization effects that vary with the alignment of the polarization of the source relative to that of the antennas, and the baselines of the antenna pairs. These must be accounted for without small-field or other approximations. Direction-independent effects such as large-scale propagation in the atmosphere and the ionosphere, and the response of the receiving system, can also be included.

The reverse operation, i.e., the calculation of the optimum estimate of the image from the measured visibility values, is less simple. Taking the Fourier transform of the observed visibility function usually produces a brightness function with physically distorted features such as negative brightness values in some places. However, starting with a simple but physically realistic model for the brightness, the measurement equation can accurately provide the corresponding visibility values that would be observed. By comparing these with the observed values, it is possible to adjust the brightness model toward the observed distribution and, by iterative repetition of this process, to arrive at an image that agrees with the visibility measurements to within the uncertainties resulting from the noise. An example of this process of making an image of a radio source is described by Rau et al. (2009), who use an iterative Newton–Raphson approach, as follows.

  1. 1.

    Calibrate the interferometer responses by making observations of sources with known position and structure. This includes measurement of both parallel and cross polarizations (for circular or linear polarization, whichever is used).

  2. 2.

    Make observations of the area of sky under investigation and, using the calibration data from (1), determine the (complex) visibility function for points in a rectangular grid in the (u, v) plane.

  3. 3.

    Using the measurement equation, calculate visibility values for a model source centered in the area in (2), for the (u, v) values of the gridded visibility measurements in (2). The model can make use of any prior information on the source under observation, but otherwise a point source model will generally suffice.

  4. 4.

    Subtract the calculated visibilities for the model source from the corresponding observed values in (2), and take the Fourier transform of the difference to provide a brightness function that represents the difference between the sky and the model.

  5. 5.

    Use the brightness function from (4) to improve the model brightness function, i.e., to make it closer to the visibilities measured in (2). To do this, add a fraction γ of the brightness function from (4), to the model, to provide a new model source. γ is the loop gain in the process.

  6. 6.

    Calculate the visibility values (Vm j ) for the improved source model from (5), and if they are sufficiently close to the observed visibilities (Vo j ), go to (7). Otherwise, return to (4) with the improved model from (5). Comparison of the observed and model visibilities involves computation of χ 2 =  j [(Vo j Vm j )(Vo j Vm j )], which is minimized by the iterative process.

  7. 7.

    Take the residual differences between the observed and model visibility values in (6), Fourier transform them to brightness, and add them to the model values from (6). This step ensures that the Fourier transform of the final model is equal to the observed visibilities.

The number of iterations (from step 6 back to step 4) required varies inversely with the value of γ in step 5. A value of γ = 0. 5 or less allows the optimum solution to be approached more accurately by using smaller steps. The choice of the model source in step 3 is not critical. For example, if the source is actually a wide one and a point source is used as the model, then in step 3, the model visibility values will have significant values over a much wider range of (u, v) spacings than that of the measured visibilities. However, in step 4, the fraction γ of the excess visibilities is subtracted, and the model sequentially moves toward the measured visibility, within the limits of the noise. Obtaining an image that is a realistic model of the sky, and is in agreement with the measured visibility, is the essential goal in synthesis imaging. This iterative procedure with χ 2 minimization illustrates the basic approach to a number of the processes used in imaging.

10.3 Closure Relationships

Closure effects are relationships between visibility values for baselines that form a closed figure, for example, a triangle or quadrilateral with the antennas at the vertices. As shown by Eqs. (7.37) and (7.38), the correlator output for antenna pair (m, n) can be written as

$$\displaystyle{ r_{mn} = G_{mn}\mathcal{V}_{mn} = g_{m}g_{n}^{{\ast}}\mathcal{V}_{ mn}\;, }$$
(10.31)

where G mn is the complex gain for the antenna pair, and g m and g n are gain factors for the individual antennas. We ignore any gain terms that do not factor into the terms for individual antennas (see Sect. 7.3.3), i.e., those that are baseline dependent.

Considering first the phase relationships, we represent the arguments of the exponential terms of r mn , g m , g n , and \(\mathcal{V}_{mn}\) by ϕ mn , ϕ m , ϕ n , and ϕ vmn , respectively. Thus, we can write

$$\displaystyle{ \phi _{mn} =\phi _{m} -\phi _{n} +\phi _{vmn}\;. }$$
(10.32)

For three antennas m, n, and p, the phase closure relationship is

$$\displaystyle{ \begin{array}{rl} \phi _{c_{mnp}} & =\phi _{mn} +\phi _{np} +\phi _{pm} \\ & =\phi _{m} -\phi _{n} +\phi _{vmn} \\ &\quad +\phi _{n} -\phi _{p} +\phi _{vnp} \\ &\quad +\phi _{p} -\phi _{m} +\phi _{vpm}\\ \end{array} }$$
(10.33)

or

$$\displaystyle{ \phi _{c_{mnp}} =\phi _{vmn} +\phi _{vnp} +\phi _{vpm}\;. }$$
(10.34)

The antenna gain terms, g m and so on, contain the effects of the atmospheric paths to the antennas as well as instrumental effects, and since these terms do not appear in Eq. (10.34), it is evident that the combination of the three correlator output phases constitutes an observable quantity that depends only on the phase of the visibility. This property of the phase closure relationships was first recognized and used by Jennison (1958).

If a point source is observed, then the visibility phases are all zero, and, in the absence of receiver noise, the closure phase is also zero. Note that if the rms phase noise on each baseline is σ, the rms noise in the closure phase is \(\sqrt{3}\sigma\).

To help visualize the phase closure concept, consider three stations of an array observing a point source, as shown in Fig. 10.7. We depict the origin of the instrumental phase terms associated with each station as being caused by atmospheric delay along each line of sight. The total visibility phase on each baseline is \(\phi _{v} = \frac{2\pi } {\lambda } \mathbf{D}\boldsymbol{\, \cdot \,}\mathbf{s}\); hence, the closure phase is

$$\displaystyle{ \phi _{c_{mnp}} = \frac{2\pi } {\lambda } \left (\mathbf{D}_{mn} + \mathbf{D}_{np} + \mathbf{D}_{pm}\right )\boldsymbol{\, \cdot \,}\mathbf{s} = 0 }$$
(10.35)

because the sum of the baselines around a triangle is identically zero. This shows that the closure phase for a point source is zero, even if it is not at the phase-tracking center or if the station coordinates have errors. A corollary of this result is that the position of a source cannot be deduced from closure phase measurements alone.

Fig. 10.7
figure 7

A three-baseline triangle for antennas m, n, and p. s is the unit vector in the direction of the source. The phases of the antenna-based gain factors are represented by atmospheric cloudlets that cause excess phase shifts of ϕ m , ϕ n , and ϕ p , respectively.

If we have n a antennas and we measure the correlation of all pairs, the number of independent phase closure relationships is equal to the number of correlator output phases less the number of unknown instrumental phases, one of which can be arbitrarily chosen. If there are no redundant spacings, then each closure relationship provides different information on the source structure. The number of phase closure relationships is

$$\displaystyle{ \frac{1} {2}n_{a}(n_{a} - 1) - (n_{a} - 1) = \frac{1} {2}(n_{a} - 1)(n_{a} - 2)\;. }$$
(10.36)

It is often important to be able to identify which set of closure triangles can be considered to be independent. This is necessary if closure phases are to be used directly in model fits. Combinatorial mathematics is useful in this regard. The question of how many triangles can be formed among n a antennas can be rephrased as: Among n a objects, how many unique ways can three of them be chosen without replacement or regard to order? The answer is the binomial coefficient

$$\displaystyle{ n_{PT} ={ n_{a}\choose 3} = \frac{n_{a}!} {(n_{a} - 3)!3!} = \frac{n_{a}(n_{a} - 1)(n_{a} - 2)} {6} \;. }$$
(10.37)

Similarly, the number of baselines, n b , is

$$\displaystyle{{ n_{a}\choose 2} = \frac{n_{a}(n_{a} - 1)} {2} \;. }$$
(10.38)

A set of independent triangles can be found by the following process. Select one antenna as a reference, as shown in Fig. 10.8. The set of independent triangles is all of those that include the reference antenna. The nonindependent triangles are the ones that do not involve the reference antenna, taken to be antenna 1, i.e.,

$$\displaystyle{ \phi _{c_{mnp}} =\phi _{mn} +\phi _{np} +\phi _{pm}\;, }$$
(10.39)
Fig. 10.8
figure 8

The four closure triangles among four antennas. The three triangles involving the reference antenna, denoted by 1, are independent. The phase closure on the fourth triangle linking antennas m, n, and p can be derived from the three independent phase closures.

where none of n, m, and p are not equal to one. The sum of closure phases

$$\displaystyle{ \begin{array}{rl} \phi _{c_{1nm}} & =\phi _{1n} +\phi _{nm} +\phi _{m1} \\ \phi _{c_{1mp}} & =\phi _{1m} +\phi _{mp} +\phi _{p1} \\ \phi _{c_{1pn}} & =\phi _{1p} +\phi _{pn} +\phi _{n1}\\ \end{array} }$$
(10.40)

is

$$\displaystyle{ \phi _{nm} +\phi _{mp} +\phi _{pn}\;, }$$
(10.41)

since ϕ 1n  = −ϕ n1, ϕ 1m  = −ϕ m1, and ϕ 1p  = −ϕ p1. The number of independent closure triangles is thus given by

$$\displaystyle{ n_{P\,\mathrm{indep}} = \binom{n_{a} - 1}{2} = \frac{(n_{a} - 1)(n_{a} - 2)} {2} \;, }$$
(10.42)

in agreement with Eq. (10.36). The fraction of the phase information that can recovered from phase closures in an array is

$$\displaystyle{ f_{P} = n_{P\,\mathrm{indep}}/n_{b} = \frac{(n_{a} - 1)(n_{a} - 2)} {2} \Bigg/\,\frac{n_{a}(n_{a} - 1)} {2} = 1 - \frac{2} {n_{a}}\;. }$$
(10.43)

Representative numbers are given in Table 10.1.

Table 10.1 Baselines and phase closures for an array of n a elementsa

We now discuss the amplitude closure relations. An amplitude closure relationship involves four antenna pairs, for which four antennas m, n, p, and q are required:

$$\displaystyle{ \frac{\vert r_{mn}\vert \vert r_{pq}\vert } {\vert r_{mp}\vert \vert r_{nq}\vert } = \frac{\vert \ \mathcal{V}_{mn}\vert \vert \mathcal{V}_{pq}\vert } {\vert \mathcal{V}_{mp}\vert \vert \mathcal{V}_{nq}\vert }\;. }$$
(10.44)

The proof of Eq. (10.44) is obtained by substituting terms of the form \(g_{m}g_{n}^{{\ast}}\mathcal{V}_{mn}\) into the left side of Eq. (10.44), using Eq. (10.31). The moduli of the g terms then cancel out because the numerator and denominator both contain the product of the moduli of all four g terms. A total of six closure amplitudes can be formed. Three will be reciprocals of the other three and ignored. The basic three configurations are shown in Fig. 10.9. The product of these three closure amplitudes—\(\vert r_{mn}\vert \vert r_{pq}\vert \big/\vert r_{mp}\vert \vert r_{nq}\vert \), \(\vert r_{mp}\vert \vert r_{nq}\vert \big/\vert r_{mq}\vert \vert r_{np}\vert \), and \(\vert r_{mn}\vert \vert r_{pq}\vert \big/\vert r_{mq}\vert \vert r_{np}\vert \)—is unity, so only two of them are independent. The number of independent amplitude closure relationships for n a antennas with no redundant baselines is equal to the number of measured amplitudes, \(\frac{1} {2}n_{a}(n_{a} - 1)\), less the number of unknown antenna gain factors n a , that is,

$$\displaystyle{ n_{A\,\mathrm{indep}} = \frac{1} {2}n_{a}(n_{a} - 1) - n_{a} = \frac{1} {2}n_{a}(n_{a} - 3)\;. }$$
(10.45)

The fraction of amplitude information that can be recovered from amplitude closures is

$$\displaystyle{ f_{A} = \frac{n - 3} {n - 1}\;. }$$
(10.46)

For early usage of the principle of taking ratios of observed visibility amplitudes to eliminate instrumental gains, see Smith (1952) and Twiss et al. (1960). The total number of closure quadrangles is

$$\displaystyle{ n_{AT} = 6\,\binom{n_{a}}{4}\;, }$$
(10.47)

which is on the order of n a 4. Systematic procedures can be devised to select an independent set. For a detailed analysis of amplitude closure structures, see Lannes (1991).

Fig. 10.9
figure 9

The three closure amplitudes that can be formed among four antennas [see Eq. (10.34)]. (We have not included the trivially redundant reciprocal cases, i.e., solid band dotted lines interchanged.) In each case, the two visibility moduli that go in the numerator of the closure amplitude are shown by the solid lines, and the two that go in the denominator are shown by the dashed lines. The product of the three closure amplitudes is unity, so only two of the closure amplitudes are independent.

Note that a fundamental requirement for the validity of the closure relationships is that at any instant, it must be possible to represent the effect of any signal path from the source to the correlator by a single complex gain factor. Thus, the effects of the atmosphere must be constant over the source under observation, that is, the angular width of the source should be no greater than the isoplanatic patch size for the atmosphere. The isoplanatic patch is the area of sky within which the path length for an incident wave remains constant to within a small fraction of a wavelength; see also Sect. 11.8.4 The size of the isoplanatic patch varies with frequency. At a few hundred megahertz or less, it is common to have more than one source within an antenna beam, and these may be separated sufficiently in angle that ionospheric conditions may be different for each one. The closure conditions will then be different for each source, and use of the closure principle then becomes more complicated than in the single-source case discussed above.

The closure relationships have proved to be very important in synthesis imaging. When applied to unresolved point sources, the phase closure should be zero and the amplitude closure unity. Thus, they are useful in checking the accuracy of calibration and examining instrumental effects. For resolved sources, they can be used as observables in situations in which direct calibration by observation of a calibration source is not practicable, as is sometimes the case in VLBI. Most importantly, they can be used to improve calibration accuracy for observations where high dynamic range is required. The amplitude closure relationships are less frequently used because it is generally easier to calibrate the visibility amplitudes than the phases. However, they provide a useful check in cases in which the amplitude is required with especially high accuracy [for examples, see Trotter et al. (1998); Bower et al. (2014), and Ortiz-León et al. (2016)].

10.4 Visibility Model Fitting

The fitting of simple intensity models to visibility data was practiced extensively in early radio interferometry, especially when the visibility phase was poorly calibrated or the data were not sufficiently complete to allow Fourier transformation. Examples of simple models are shown in Figs. 1.51.10, and the Gaussian components in Fig. 1.14

Model fitting continues to be the only recourse for data interpretation in sparse VLBI arrays such as those used at short millimeter wavelengths [see, e.g., Doeleman et al. (2008)]. However, model fitting is very important even in large, well-sampled arrays that can generate high-quality images. These images are produced by a complex process that includes Fourier transformation of visibility data that have been interpolated onto a grid, followed by self-calibration and application of nonlinear deconvolution algorithms such as CLEAN, as described in Chap. 11 The noise in these images is correlated among pixels and can have poorly understood characteristics. Such images are not unique and can be considered to be models of the true brightness distributions. Extractions of source parameters in the image plane can therefore be characterized as “modeling the model.”

In contrast, the fundamental data product of an array, the visibilities, has well-characterized noise properties, i.e., it is uncorrelated Gaussian noise with known variance. If the characteristics of the source emission structure are to be interpreted with a specific model in mind, the parameters of such a model can often best be obtained from direct analysis of the visibility data. Important examples of the application of model fitting include the cases of sources whose intensity decreases as a power law, as described in Sect. 10.4.4. In these cases, the proper estimate of the total flux density and other parameters from image plane analysis is difficult. Another application of visibility model fitting is in the determination of the changes in parameters of a source in which time-separated observations may not have identical (u, v) coverage. Fitting the same model to both data sets, but allowing the parameters of interest to vary, is likely to give the best evidence of change. An interesting example is provided by Masson (1986) in a measurement of angular expansion of a compact planetary nebula. From several data sets obtained at different epochs, the image from the one with the best (u, v) coverage was used as a model to fit to the others, thereby avoiding direct comparison of images made with different synthesized beams.

A useful discussion of the general principles of model fitting can be found in Pearson (1999). For the estimate of large numbers of parameters in a Bayesian framework, see Lochner et al. (2015). There are advantages for searching for transient sources in the (u, v) data (Trott et al. 2011).

10.4.1 Basic Considerations for Simple Models

We consider the case of the small field of view (l, m ≪ 1, A(l, m) ≃ 1), where the transform between image intensity and visibility given in Eqs. (3.7) and (3.10) can be written as

$$\displaystyle\begin{array}{rcl} \mathcal{V}(u,v)& =& \int _{-\infty }^{\infty }I(l,m)\,e^{-j2\pi (ul+vm)}dl\,dm{}\end{array}$$
(10.48)
$$\displaystyle\begin{array}{rcl} I(l,m) =\int _{ -\infty }^{\infty }\mathcal{V}(u,v)\,e^{\,j2\pi (ul+vm)}du\,dv\;.& &{}\end{array}$$
(10.49)

A simple common source model is a Gaussian intensity distribution centered at position (l 1, m 1) with peak intensity I 0 and width parameter a:

$$\displaystyle{ I(l,m) = I_{0}\exp \left [\frac{-(l - l_{1})^{2} - (m - m_{1})^{2}} {2a^{2}} \right ]\;, }$$
(10.50)

which has FWHM, θ G , of \(\sqrt{8\ln 2}a\). The corresponding model visibility distribution is

$$\displaystyle{ \mathcal{V}_{m}(u,v) = S_{0}e^{-2\pi ^{2}a^{2}(u^{2}+v^{2})-j2\pi (ul_{ 1}+vm_{1})}\;, }$$
(10.51)

where S 0 = 2π I 0 a 2, the total flux density. The visibility has real and imaginary components that are sinusoidal corrugations, the ridges of which are normal to the radius vector to the point (l 1, m 1) in the image domain. These visibility components are modulated in amplitude by a Gaussian function centered on the (u, v) origin and of width inversely proportional to σ. Examination of the visibility distribution can thus indicate the form and position of the main intensity components. For early discussions and examples of this type of model fitting, see, for example, Maltby and Moffet (1962); Fomalont (1968), and Fomalont and Wright (1974). Fitting the four parameters (I 0, a, l 1, m 1) in the image plane or (S 0, a, l 1, m 1) in the visibility plane is a nonlinear process. It requires an initial guess for the parameters. The choice of these initial parameters is more obvious in the image plane than in the visibility plane, but final analysis is best done in the visibility plane.

To fit model parameters, it is necessary to choose a criterion for the goodness of fit. Since the real and imaginary components of the visibility usually have Gaussian noise, the optimum criterion from a maximum likelihood point of view (see Appendix 12.1) is the χ 2 criterion, which minimizes the weighted mean-squared difference between the model and the data set of n d points, i.e.,

$$\displaystyle{ \chi ^{2} =\sum _{ i=1}^{n_{d} }\frac{[\mathcal{V}(u_{i},v_{i}) -\mathcal{V}_{m}(u_{i},v_{i},\mathbf{p})][\mathcal{V}(u_{i},v_{i}) -\mathcal{V}_{m}(u_{i},v_{i},\mathbf{p})]^{{\ast}}} {\sigma _{i}^{2}} \;, }$$
(10.52)

where \(\mathcal{V}(u_{i},v_{i})\) are the measured visibilities, \(\mathcal{V}_{m}(u_{i},v_{i},\mathbf{p})\) are the model visibilities with n p parameter p, and the σ i ’s are the measurement errors. For a perfect fit, the expected minimum value of χ 2 is n d n p , and the standard deviation of χ 2 is \(\sqrt{ 2(n_{d} - n_{p})}\). The reduced chi square, χ r 2, which is χ 2∕(n d n p ), should be close to unity for a good fit. χ r 2 > 1 indicates that the model is not correctly parametrized or that the estimates of errors are not correct. In any fitting procedures, the residuals, i.e., \([\mathcal{V}(u_{i},v_{i}) -\mathcal{V}_{m}(u_{i},v_{i},\mathbf{p})]/\sigma _{i}\), should be examined for any systematic deviations from a Gaussian probability distribution. Such deviations suggest that more or different parameters are required. If the deviations follow a Gaussian probability distribution, then the problem may be that values of σ i are misestimated by a constant factor that can be chosen to make χ r 2 = 1. Another common defect is that the data have a noise floor. In this case, the σ i 2 terms can be replaced by σ i 2 +σ f 2, where σ f represents a noise floor and σ f 2 is chosen so that χ r 2 = 1. A σ f 2 > 0 has the effect of reducing the importance of measurements with low σ i , and σ f  ≫ σ i tends toward a solution that gives equal weight for all data regardless of σ i .

Note that Eq. (10.52) can be written as

$$\displaystyle{ \chi ^{2} =\sum _{ i=1}^{n_{d} }\frac{(\mathcal{V}_{R_{i}} -\mathcal{V}_{mR_{i}})^{2} + (\mathcal{V}_{I_{i}} -\mathcal{V}_{mI_{i}})^{2}} {\sigma _{i}^{2}} \;, }$$
(10.53)

where \(\mathcal{V}_{R}\) and \(\mathcal{V}_{I}\) are the real and imaginary parts of \(\mathcal{V}\), and \(\mathcal{V}_{mR}\) and \(\mathcal{V}_{mI}\) are the real and imaginary parts of \(\mathcal{V}_{m}\). The data to be fitted may consist of visibility amplitudes and closure phases. In this case, the χ 2 can be written as

$$\displaystyle{ \chi ^{2} =\sum _{ i=1}^{n_{d} }\frac{[\vert \mathcal{V}\vert -\vert \mathcal{V}_{m}\vert ]^{2}} {\sigma _{A_{i}}^{2}} +\sum _{ i=1}^{n_{c} }\frac{(\phi _{c_{i}} -\phi _{mc_{i}})^{2}} {\sigma _{c_{i}}^{2}} \;, }$$
(10.54)

where \(\sigma _{A_{i}}^{2}\) and \(\sigma _{c_{i}}^{2}\) are the measurement variances on the closure amplitudes and closure phases, respectively. In the strong signal case (see Sect. 9.3.3),

$$\displaystyle{ \sigma _{A_{i}}^{2} =\sigma _{ i}^{2}\qquad \mathrm{and}\qquad \sigma _{ c_{i}}^{2} = \left ( \frac{\sigma _{1}} {\mathcal{V}_{1}}\right )^{2} + \left ( \frac{\sigma _{2}} {\mathcal{V}_{2}}\right )^{2} + \left ( \frac{\sigma _{3}} {\mathcal{V}_{3}}\right )^{2}\;. }$$
(10.55)

In cases of weaker signal, the application of Eq. (10.54) may not yield an optimum solution because the probability distributions for the closure and amplitude become non-Gaussian. In particular, the probability distribution of the closure amplitude becomes progressively more skewed as the signal-to-noise ratio (SNR) decreases.

Examples of models fitted to visibility data sets with limited amounts of closure data can be found in Akiyama et al. (2015); Fish et al. (2016), and Lu et al. (2013).

The computation challenge of finding the minimum value of χ 2 can be daunting. A popular method that is straightforward to implement but that can require larg e computation resources is the Markov chain Monte Carlo (MCMC) algorithm based on Bayesian theory. It provides a way to systematically vary the parameters in search of a χ 2 minimum. It also produces posterior probability functions for the parameters [see, e.g., Sivia (2006)].

There is an important relationship between the moments of the intensity distribution and the visibility. The zero-order moment is equal to the flux density S, the odd-order moments contribute to the imaginary components of the visibility, and the even-order moments contribute to the real part. If the source is symmetrical in l, the odd-order terms are zero. If, in addition, the source is only slightly resolved, the decrease in \(\mathcal{V}\) results mainly from the second-moment term. Then the source can be represented by a symmetrical model with an appropriate second moment.

For simplicity, consider the one-dimensional problem

$$\displaystyle{ \mathcal{V}_{1}(u) =\int _{ -\infty }^{\infty }I_{ 1}(l)\,e^{-j2\pi ul}dl\;, }$$
(10.56)

where \(\mathcal{V}_{1}(u) = \mathcal{V}(u,0)\) and

$$\displaystyle{ I_{1}(l) =\int _{ -\infty }^{\infty }I(l,m)\,dm\;. }$$
(10.57)

Each derivative of \(\mathcal{V}_{1}\) with respect to u introduces a factor of − j2π l, so that the nth derivative can be written as

$$\displaystyle{ \mathcal{V}_{1}^{(n)}(u) =\int _{ -\infty }^{\infty }(-j2\pi l)^{n}I_{ 1}\,e^{-j2\pi ul}dl }$$
(10.58)

or

$$\displaystyle{ \mathcal{V}_{1}^{(n)}(0) = (-j2\pi )^{n}\int _{ -\infty }^{\infty }l^{n}I_{ 1}(l)\,dl\;. }$$
(10.59)

The Taylor expansion of \(\mathcal{V}_{1}(u)\) is

$$\displaystyle{ \mathcal{V}_{1}(u) = \mathcal{V}_{1}(0) + \mathcal{V}_{1}^{{\prime}}(0)u + \mathcal{V}_{ 1}^{{\prime\prime}}(0)\frac{u^{2}} {2} +\ldots +\mathcal{V}_{1}^{(n)}(0)\frac{u^{n}} {n!} +\ldots }$$
(10.60)

or

$$\displaystyle{ \mathcal{V}_{1}(u) = M_{0} +\sum \limits _{ n=1}^{\infty }\frac{(-j2\pi )^{n}} {n!} M_{n}u^{n}\;, }$$
(10.61)

where

$$\displaystyle{ M_{n} =\int _{ -\infty }^{\infty }l^{n}I_{ 1}(l)\,dl\;. }$$
(10.62)

The Taylor expansion requires that the moments be finite.

10.4.2 Examples of Parameter Fitting to Models

The model most commonly encountered in interferometry is a simple Gaussian distribution with unknown flux density, size, and position, as described by Eq. (10.51). The four model parameters, S 0, a, l 1, and m 1 can be estimated from standard procedures for nonlinear least-mean-squares analysis (Appendix 12.1). This analysis requires initial guesses for the parameters. The model can be generalized to an elliptical Gaussian source described by major and minor axis diameters and a position angle (six-parameter fit).

To gain an understanding of the accuracy to which parameters of a simple model can be deduced, consider a slightly resolved source having an azimuthally symmetric distribution of unknown position observed at a set of n d points with noise σ. In the case of high SNR, we can analyze the visibility amplitude and phase separately. The model for the visibility phase and amplitude can be written

$$\displaystyle\begin{array}{rcl} \phi = 2\pi (u_{1}l_{1} + v_{1}m_{1})& &{}\end{array}$$
(10.63)
$$\displaystyle\begin{array}{rcl} \vert \mathcal{V}\vert = S_{0} - bq^{2}\;,& &{}\end{array}$$
(10.64)

where q 2 = u 2 + v 2 and l 1, m 1, and b are parameters to be determined. We further assume that m 1 is zero.

A simulated data set is shown in Fig. 10.10. The models are linear in the parameters l 1, S 0, and b. These parameters can be estimated via the usual linear solutions to the χ 2 minimization equations for phase and amplitude [see Appendix 12.1 or Bevington and Robinson (1992)]. The estimate of l 1 is

$$\displaystyle{ l_{1} = \frac{\frac{1} {2\pi }\sum _{i=1}^{n_{d} }\phi _{i}u_{i}/\sigma _{\!\!\phi _{i}}^{2}} {\sum _{i=1}^{n_{d} }u_{i}^{2}/\sigma _{\!\!\phi _{ i}}^{2}} \;, }$$
(10.65)

where \(\sigma _{\!\!\phi _{i}} \simeq \sigma _{i}/\vert \mathcal{V}\vert _{i}\) and σ i is defined in Eq. (6.50). We assume that all antennas have the same sensitivity, so σ i  = σ, and that \(\vert \mathcal{V}\vert \sim S_{0}\), so that \(\sigma _{\phi _{ i}}\) is approximately constant. In this case,

$$\displaystyle{ \sigma _{l_{1}} = \frac{\sigma /S_{0}} {2\pi \left [\sum _{i=1}^{n_{d} }u_{i}^{2}\right ]^{1/2}}\;. }$$
(10.66)

If the data are uniformly spaced at intervals Δ u, i.e., u i  = i Δ u, then ∑ u i 2 = (Δ u)2 ∑ i 2 = (Δ u)2 n d (n d + 1)(2n d + 1)∕6 ≃ (Δ u)2 n d 3∕3 for n d  ≫ 1. Hence,

$$\displaystyle{ \sigma _{l_{1}} \simeq \frac{1} {2\pi }\sqrt{ \frac{3} {n_{d}}} \frac{\sigma } {S_{0}} \frac{1} {u_{\mathrm{max}}}\;, }$$
(10.67)

where u max = n d Δ u, or

$$\displaystyle{ \sigma _{l_{1}} \simeq \frac{0.3} {\sqrt{n_{d}}} \frac{\sigma } {S_{0}} \frac{\lambda } {D_{\mathrm{max}}}\;, }$$
(10.68)

where D max = λ u max. This formula is close to the one used in astrometry for direct image fitting [see Eq. (12.16)].

Fig. 10.10
figure 10

Fringe visibility model and data for a slightly resolved, azimuthally symmetric source. (left) Visibility amplitude quadratically declining, indicative of the source being resolved; (right) visibility phase, indicative of a position offset.

The estimates of S 0 and b, along with their errors, \(\sigma _{S_{0}}\) and σ b , are

$$\displaystyle\begin{array}{rcl} S_{0} = \frac{1} {\varDelta } {\Biggl [\sum _{i=1}^{n_{d} }q_{i}^{4}\,\sum _{ i=1}^{n_{d} }\vert \mathcal{V}\vert _{i} -\sum _{i=1}^{n_{d} }q_{i}^{2}\,\sum _{ i=1}^{n_{d} }\vert \mathcal{V}\vert _{i}q_{i}^{2}\Biggr ]}& &{}\end{array}$$
(10.69)
$$\displaystyle\begin{array}{rcl} \sigma _{S_{0}}^{2} = \frac{\sigma ^{2}} {\varDelta } \,\sum _{i=1}^{n_{d} }q_{i}^{4}& &{}\end{array}$$
(10.70)
$$\displaystyle\begin{array}{rcl} b = \frac{1} {\varDelta } {\Biggl [n_{d}\,\sum _{i=1}^{n_{d} }\vert \mathcal{V}\vert _{i}q_{i}^{2} -\sum _{ i=1}^{n_{d} }q_{i}^{2}\,\sum _{ i=1}^{n_{d} }\vert \mathcal{V}\vert _{i}\Biggr ]}& &{}\end{array}$$
(10.71)
$$\displaystyle\begin{array}{rcl} \sigma _{b}^{2} = \frac{n_{d}} {\varDelta } \;,& &{}\end{array}$$
(10.72)

where \(\varDelta = n_{d}\sum q_{i}^{4} -\left (\sum q_{i}^{2}\right )^{2}\). If the data are uniformly spaced at intervals of Δ q from 0 to q max = n d Δ q, then, if we use the approximations ∑ q i 4 ≃ n d 5∕5 and ∑ q i 2 ≃ n d 3∕3, for n d  ≫ 1,

$$\displaystyle{ \sigma _{S_{0}} \simeq \frac{\sigma } {\sqrt{n_{d}}}\;, }$$
(10.73)

and

$$\displaystyle{ \sigma _{b} \simeq \sqrt{ \frac{5} {n_{d}}}\, \frac{1} {q_{\mathrm{max}}^{2}}\;. }$$
(10.74)

For a Gaussian source distribution, the Taylor expansion of the visibility function in Eq. (10.51) (see Table 10.2) gives b = 2π 2 a 2 S 0. Since θ G , the FWHM angular diameter, is \(\sqrt{8\ln 2}a\), we obtain

$$\displaystyle{ \theta _{G} = \left [\frac{4\ln 2} {\pi ^{2}} \, \frac{b} {S_{0}}\right ]^{1/2}\;. }$$
(10.75)

The uncertainty in θ G , \(\sigma _{\theta _{ G}}\), for the case \(\sigma _{\theta _{G}} \ll \theta _{G}\), will be

$$\displaystyle{ \sigma _{\theta _{G}} \simeq \frac{4\ln 2} {2\pi ^{2}} \sqrt{ \frac{5} {n_{d}}} \frac{\sigma } {S_{0}}\, \frac{1} {\theta _{G}q_{\mathrm{max}}^{2}}\;. }$$
(10.76)

The minimum source size that can actually be measured at the 1-sigma error level is about \(\theta _{\mathrm{min}} \sim \sigma _{\theta _{G}} \sim \theta _{G}\), which is

$$\displaystyle{ \theta _{\mathrm{min}} \simeq \frac{0.6} {\sqrt{\mathcal{R}_{\mathrm{sn }}}}\, \frac{\lambda } {D_{\mathrm{max}}}\;, }$$
(10.77)

where the signal-to-noise ratio \(\mathcal{R}_{\mathrm{sn}} = S_{0}\sqrt{n_{d}}/\sigma\) and D max = λ q max. A more precise and general analysis for various levels of statistical significance is given by Martí-Vidal et al. (2012).

Table 10.2 Visibility functions for azimuthally symmetric source distributionsa

Note that position and angular parameters can be estimated to an accuracy limited only by the SNR and the confidence in the model. When the SNR is very high, the size can be determined even though it is much less than the nominal beam size. Model fitting should not be confused with super-resolution deconvolution.Footnote 2

10.4.3 Modeling Azimuthally Symmetric Sources

A very important class of models is those that have azimuthal symmetry, i.e., I(l, m) = I(r), where \(r = \sqrt{l^{2 } + m^{2}}\). For the following analysis, the position of the source is assumed to be known. In this case, the Fourier transform between the image and visibility becomes a Hankel transform [see Bracewell (1995, 2000), Baddour (2009)], i.e.,

$$\displaystyle\begin{array}{rcl} \mathcal{V}(q)& =& 2\pi \int _{0}^{\infty }I(r)\,J_{ 0}\,(2\pi rq)r\,dr{}\end{array}$$
(10.78)
$$\displaystyle\begin{array}{rcl} I(r) = 2\pi \int _{0}^{\infty }\mathcal{V}(q)\,J_{ 0}\,(2\pi rq)q\,dq\;,& &{}\end{array}$$
(10.79)

where \(q = \sqrt{u^{2 } + v^{2}}\). \(\mathcal{V}(q)\) is a real function, i.e., the visibility phase is zero.

A useful model is one of a uniform bright circular source of intensity I 0 and radius a. Since ∫ J 0(x) xdx = xJ 1(x),

$$\displaystyle{ \mathcal{V}(q) =\pi a^{2}I_{ 0}\frac{J_{1}(2\pi aq)} {\pi aq} \;, }$$
(10.80)

where J 1(2π a q)∕π a q = 1 for q = 0 and π a 2 I 0 = S 0, the total flux density. The visibility of an annulus of inner and outer radii a 1 and a 2 can be represented as the difference of two disk visibility functions

$$\displaystyle{ \mathcal{V}(q) =\pi a_{2}^{2}I_{ 0}\frac{J_{1}(2\pi a_{2}q)} {\pi a_{2}q} -\pi a_{1}^{2}I_{ 0}\frac{J_{1}(2\pi a_{1}q)} {\pi a_{1}q} \;, }$$
(10.81)

i.e., the difference of two area-normalized jinc functions. The visibility functions for these and a number of other models are listed in Table 10.2 and shown in Fig. 10.11. An important lesson is that circularly symmetric models are very hard to distinguish for short baselines where the visibility decreases quadratically according to a size parameter. It is interesting to compare the visibility functions for a ring and thin annular disk, as shown in Fig. 10.12. The visibilities become significantly different only when q reaches about 1/ring thickness.

Fig. 10.11
figure 11

Normalized visibility models, \(\vert \mathcal{V}\vert /\mathcal{V}_{0}\), vs. projected baseline length, q, for azimuthally symmetric source models described in Table 10.2.

Fig. 10.12
figure 12

(thin line) Visibility amplitude for a ring source with radius 1. (thick line) Visibility amplitude for an annular source with inner and outer radii of 0.8 and 1.2, respectively. Adapted from Bracewell (2000).

A useful model for the analysis of an azimuthally symmetric source might be a superposition of annuli in image space with intensities I i and outer and inner radii of a i and a i−1. The inner radius of the innermost annuli is zero, so it is a disk. The visibility function is

$$\displaystyle\begin{array}{rcl} \mathcal{V}(q)& =& \pi I_{0}a_{0}^{2}\frac{J_{1}(2\pi a_{0}q)} {\pi a_{0}q} \\ & & +\pi I_{1}a_{1}^{2}\frac{J_{1}(2\pi a_{1}q)} {\pi a_{1}q} -\pi I_{1}a_{0}^{2}\frac{J_{1}(2\pi a_{0}q)} {\pi a_{0}q} \\ & & +\pi I_{2}a_{2}^{2}\frac{J_{1}(2\pi a_{2}q)} {\pi a_{2}q} -\pi I_{2}a_{1}^{2}\frac{J_{1}(2\pi a_{1}q)} {\pi a_{1}q}\\ &&\quad \quad \vdots \\ & & +\pi I_{n}a_{n}^{2}\frac{J_{1}(2\pi a_{n}q)} {\pi a_{n}q} -\pi I_{n}a_{n-1}^{2}\frac{J_{1}(2\pi a_{n-1}q)} {\pi a_{n-1}q} \;.{}\end{array}$$
(10.82)

For the case of a uniform disk, all the I i s are the same, i.e., I 0, and the visibility is that of a uniform disk of radius a n and intensity I 0,

$$\displaystyle{ \mathcal{V}(q) =\pi a_{n}^{2}\frac{I_{0}J_{1}(2\pi a_{n}q)} {\pi a_{n}q} \;, }$$
(10.83)

as expected. Equation (10.82) can be rearranged as

$$\displaystyle{ \mathcal{V}(q) =\pi \sum _{ i=0}^{n-1}(I_{ i} - I_{i+1})a_{i}^{2}\frac{J_{1}(2\pi a_{i}q)} {\pi a_{i}q} +\pi I_{n}a_{n}^{2}\frac{J_{1}(2\pi a_{n}q)} {\pi a_{n}q} \;. }$$
(10.84)

Equation (10.84) can be fitted to data from sources with elliptical symmetry by a simple change in coordinates.

10.4.4 Modeling of Very Extended Sources

The technique of visibility modeling can be of particular importance for diffuse symmetric sources. The models for these sources often do not have finite moments, although they can have well-defined visibility functions. However, the Taylor expansion of visibility function around q = 0 described in the previous section cannot be used. We discuss two important practical examples.

The first example is that of a radio source created by a fully ionized wind, i.e., thermal plasma at constant temperature T e , surrounding a star. If the wind has a constant velocity of expansion, the electron density will decrease as the inverse square of the distance from the star. It can be shown (Wright and Barlow 1975) that the intensity distribution for such a source can be written as

$$\displaystyle{ \begin{array}{rll} I(r)& = I_{0}[1 - e^{-(r/a)^{3} }]\;, \\ & \simeq I_{0}\;, &\qquad r \ll a\;, \\ & \simeq I_{0}(r/a)^{3}\;, &\qquad r \gg a\;, \end{array} }$$
(10.85)

where I 0 = 2kT e (νc)2 (the Planck function), and a is the angular radius where the optical depth is unity. The rather benign-looking intensity profile, shown in Fig. 10.13, has an FWHM of about 1. 25a, and the intensity falls off as r −3. The flux density is given by

$$\displaystyle{ S_{0} = \frac{2\pi ^{2}a^{2}I_{0}} {\sqrt{3}\Gamma (2/3)}\;, }$$
(10.86)

where \(\Gamma \) is the gamma function. S 0 is 1.3 times the flux density of a uniformly bright source of radius a. This source has the interesting characteristic that its angular size varies as ν 0. 7 (because a scales as ν −2. 1), and the flux density varies as ν 0. 6 (see the example of MWC349A in Fig. 1.1). However, the second and higher moments of the intensity distribution are infinite. Nonetheless, the visibility function can be calculated from Eq. (10.78). It is shown in Fig. 10.13. It has the interesting characteristic that it decreases linearly (rather than quadratically) with q, that is,

$$\displaystyle{ \mathcal{V}(q) \simeq S_{0}(1 - bq)\;, }$$
(10.87)

where b = 2πS 0. This behavior can be understood intuitively from the fact that the source extends smoothly to infinity. Hence, the correlated flux density continues to increase as the baseline decreases to zero. Such a visibility curve has been observed [e.g., White and Becker (1982) and Contreras et al. (2000)] down to the shortest baselines used for the measurements. From the zero spacing flux, \(\mathcal{V}(0) = S_{0}\), and the slope of the normalized visibility curve, b, we can determine the electron density at a reference distance and the electron temperature (Escalante et al. 1989). A more realistic model is one with an ionization cutoff at some distance from the star, which truncates the radio emission. Making the source finite in extent makes all the moments finite, and the visibility function, shown in Fig. 10.13 (right) is dominated by a quadratic term at zero baseline. In this case, the outer radius of the source can be found from the visibility curvature at q = 0 as well as the density parameter and electron temperature.

Fig. 10.13
figure 13

(left ) The intensity distribution defined by Eq. (10.85) for a stellar wind source where the radius is in units of a. The inset shows the intensity on a logarithmic scale. (right ) Visibility function for the intensity distribution. The inset shows the visibility function near q = 0 and also for the case in which the intensity distribution is truncated at r = 5a. Note that the visibility function departs from the untruncated distribution for \(q\lesssim 1\)/truncation radius and approaches q = 0 quadratically.

The second example is useful in modeling the Sunyaev–Zeldovich effect. An isothermal spherical distribution of ionized gas in a cluster of galaxies causes a decrement in the cosmic microwave background. For many clusters, the profile of this decrement can be modeled as

$$\displaystyle{ I(r) = \frac{I_{0}} {\sqrt{1 + \left (\frac{r} {a}\right )^{2}}}\;, }$$
(10.88)

where I 0 is the decrement at the cluster center, and a is the cluster core angular radius. The visibility function for this distribution has the analytic form (Bracewell 2000)

$$\displaystyle{ \mathcal{V}(q) = 2\pi aI_{0}\frac{e^{-2\pi aq}} {2\pi aq} \;. }$$
(10.89)

The visibility increases very rapidly as q decreases, and synthesis images made with missing short spacings are likely to underestimate I 0. However, the parameters I 0 and a can be readily estimated by fitting Eq. (10.89) to the visibility data (Hasler et al. 2012; Carlstrom et al. 1996). As with the wind case of the stellar wind source, an actual cluster source will be truncated at some radius, r c , which will keep the flux density finite and will give the visibility function a parabolic shape for baselines less than 1∕r c .

10.5 Spectral Line Observations

A basic requirement for observation of spectral lines is a receiving system that provides measurements of the signal intensity in a bandwidth less than, or comparable to, that of the expected spectral feature. Thus, a spectral line correlator produces separate visibility measurements at many points across the receiver passband, and the intensity distribution of the line features can be obtained. The data reduction involved is in principle the same as used in continuum imaging but differs in some practical details. The number of channels into which the received signal is divided is typically in the range 100–10,000. The discussion in this section is largely based on Ekers and van Gorkom (1984) and van Gorkom and Ekers (1989).

Calibration of the instrumental bandpass response is perhaps the most important step in obtaining accurate spectral line data. Generally, the channel-to-channel differences are relatively stable with time and need not be calibrated as frequently as the time-variable effects of the overall receiver gain. Except in very early systems, the channel filtering (see Sect. 8.8) is performed digitally and is not susceptible to ambient variations in temperature or voltage. The overall gain variations require periodic observation of a calibration source, as described for continuum observations. For this purpose, the summed response of the individual channels is often used, since a much longer observing time would be required to obtain a sufficient SNR in each narrow channel. For the bandpass calibration, a longer observation of a calibrator can be made to determine the relative gains of the spectral channels. Since the relative gains of the different channels into which the passband is divided change very little with time, the bandpass calibration need only be performed once or twice during, say, an 8-h observation. The bandpass calibration source should be unresolved and strong enough to provide good SNR in the spectral channels and should have a sufficiently flat spectrum. However, it need not be close in position to the source being observed.

Bandpass ripples resulting from standing waves between the antenna feed and the reflector, which pose a serious problem for single-antenna total-power systems, are much less important for interferometers. This is because the instrumental noise, including thermal noise picked up in the antenna sidelobes, is not correlated between antennas. On the other hand, for digital correlators, the Gibbs phenomenon ripples in the passband, which arise in Fourier transformation from the delay to the frequency domains, introduce a problem not found in autocorrelators. Because the cross-correlation of the signals from two antennas is real but not symmetrical as a function of delay, the cross power spectrum as a function of frequency is complex. (The autocorrelation function of the signal from a single antenna is real and symmetrical, and the power spectrum is real.) As explained in Sect. 8.8.8 (see Fig. 8.18), the imaginary part of the cross power spectrum changes sign at the origin, but the real part does not. Because of this large discontinuity at the frequency origin, ripples in the imaginary part of the frequency spectrum are of larger relative amplitude than those in the real part. The peak overshoot in the imaginary part is 18% (9% of the full step size); see also Bos (19841985). Figure 10.14 shows a calculated example. The ratio of the real and imaginary parts depends on the instrumental phase (which is not calibrated out at this stage of the analysis) and on the position of the source of the radiation relative to the phase center of the field.

Fig. 10.14
figure 14

(a ) The cross power spectrum resulting from a continuum source in which the phase is arbitrarily chosen such that the amplitudes of the real and imaginary parts are equal. (b ) Computed response of a cross-correlator with 16 channels to the spectrum in (a ). Note the difference in amplitude of the ripples in the real and imaginary parts. From D’Addario (1989), courtesy of and © the Astronomical Society of the Pacific.

Increasing the number of lags of a lag correlator, or the size of the FFT in an FX correlator, improves the spectral resolution and confines the Gibbs phenomenon ripples more closely to the bandpass edges. The data from the channels at the band edges are sometimes discarded because of the ripples and the roll-off of the frequency response. However, variations in the passband are less important in later systems in which the signals are in digital form and the passband is defined by digital filtering. An effective way to reduce the amplitude of the ripples is to taper the cross-correlation function and thus introduce smoothing into the cross power spectrum. For this smoothing, the Hann function (see Table 8.5) is often used. van Gorkom and Ekers (1989) draw attention to the following examples:

  1. 1.

    If the field contains a line source but no continuum, and the line is confined to the central part of the passband, then the spectrum has no discontinuity at the passband edges. This is the only case in which it is advisable to use different tapering of the cross-correlation function for the source and the continuum calibrator.

  2. 2.

    If, in addition to the line source, the field contains one continuum point source, and if both this source and the bandpass calibrator are at the centers of their respective fields, then an accurate calibration of the bandpass ripples is possible. The same weighting must be used for the source and calibrator.

  3. 3.

    In more complicated cases—for example, when there is both a line source and an extended continuum source within the field—the ripples will be different in the two cases, and exact calibration is not possible. Hann smoothing of the spectra of both the source and the calibrator is recommended.

10.5.1 VLBI Observations of Spectral Lines

Since VLBI observations are limited to sources of very high brightness temperature, spectral line measurements in VLBI are used mainly for the study of masers and absorption of emission from bright extragalactic sources by molecular clouds. Frequently observed maser lines include those arising from OH, H2O, CH3OH, and SiO. For absorption studies, many atomic and molecular species can be observed since the brightness temperature requirement is fulfilled by the background source. The formalism of spectral line signal processing is described in Sect. 9.3 Special considerations for astrometric measurements are given in Sect. 12.7 Here we discuss several practical issues related to the handling of spectroscopic data. The use of independent frequency standards at the antennas results in time-dependent timing errors, which introduce linear phase slopes across the basebands. The difference in Doppler shifts among the antennas can be large, and hence the residual fringe rates can also be large, which may necessitate short integration times for calibration. For masers, the phase calibration can usually be obtained from the use of the phase of a particular spectral feature as a reference. The amplitude calibration can be obtained from the measurement of the spectra derived from the data recorded at individual antennas. More details of procedures for handling spectral line data can be found in Reid (19951999).

In spectral line VLBI, it is usual to observe a compact continuum calibrator several times an hour, preferably one strong enough to give an accurate fringe measurement in 1 or 2 min of integration. If a lag-type correlator is used to cross-correlate the signals, the output is a function of time and delay. Equation (9.21), in which Δ τ g and θ 21 are functions of time, shows cross-correlation as a function of time and delay. By Fourier transformation, the arguments t and τ can be changed to the corresponding conjugate variables, which are fringe frequency, ν f , and the frequency of the spectral feature, ν, respectively. Thus, the correlator output can be expressed as a function of (t, τ), (ν f , τ), (t, ν), or (ν f , ν) and can be interchanged between these domains by Fourier transformation. This is important because some steps in the calibration are best performed in particular domains. Note that the fringe frequency in VLBI observations results mainly from the difference between the true fringe frequency and the model fringe frequency used to stop the fringes. Consider first the data from the continuum calibrator. In fringe fitting for a continuum source, it is advantageous to use visibility data as a function of fringe frequency and delay, (ν f , τ), as shown in Fig. 9.7 In that domain, the visibility data are most compactly concentrated and therefore most easily identified in the presence of the noise. In the absence of errors, the visibility will be concentrated at the origin in the (ν f , τ) domain. A shift from the origin in the τ coordinate indicates timing errors resulting from clock offsets or baseline errors. The shift Δ τ represents the difference in the errors for the two antennas. Values of Δ τ determined from the continuum calibrator are used to apply corrections to the spectral line data. Variation of the Δ τ values over time requires interpolation to the times of the spectral line data. The continuum data can also be used for bandpass calibration, to determine the relative amplitude and phase characteristics of the spectral channels.

For fringe fitting to spectral line data, it is advantageous to transform to the (t, ν) domain since, in contrast with the continuum case, the spectral line data contain features that are narrow in frequency. The cross-correlation function is therefore correspondingly broad in the delay dimension and generally more compact in frequency. Note that in the τ-to-ν transformation, ν is not the frequency of the radiation as received at the antenna, since the frequency of a local oscillator (or a combination of more than one local oscillators), ν LO, has been subtracted. Thus, ν here represents the frequency within the intermediate-frequency (IF) band that is sampled and recorded for transmission to the correlator. The (t, ν) domain is also appropriate for inserting corrections for the timing errors, Δ τ, determined from the continuum data. These corrections are made by inserting phase offsets that are proportional to frequency. Thus, the data as a function of (t, ν) are multiplied byFootnote 3 exp(j2π Δ τ ν). If the variation in the Δ τ values over time results from a clock rate error at one or both of the antennas, correction should be made for the associated error in the frequency ν LO at the antennas. The resulting phase error is corrected by multiplying the correlator output data by exp(j2π Δ τ ν LO).

Since Doppler shift corrections (see Appendix Appendix 10.2) are rarely made as local oscillator offsets at the antennas, these corrections must be made at the correlator or subsequently in the post-processing analysis. The diurnal Doppler shift is normally removed at the station level in the precorrelation fringe rotation, where the signals are delayed and frequency-shifted to a reference point at the center of the Earth. Correction for the Doppler shift due to the Earth’s orbital motion and the local standard of rest, as well as any other frequency offset, can conveniently be made on the post-correlation data by use of the shift theorem, that is, multiplication of the correlation functions by exp(j2π Δ ν τ), where Δ ν is the total frequency shift desired.

The visibility spectra can be calibrated in units of flux density by multiplication of the normalized visibility spectra by the geometric mean of the system equivalent flux densities (SEFDs) of the two antennas concerned, as discussed in Sect. 10.1.2. The SEFD is defined in Eq. (1.7). It can be determined from occasional supplemental measurements at the antennas, and the results interpolated in time. A better method for strong sources is to calculate the total-power spectrum of the source from the autocorrelation functions of the data from each antenna. These must be corrected for the bandpass response, which can be obtained from the autocorrelation functions on a continuum fringe calibrator. The amplitude of a specific spectral feature is proportional to the reciprocal of the SEFD. If greater sensitivity is required, then each measured spectrum can be matched to a spectral template obtained from a global average of all the single-antenna data or from a spectrum taken with the most sensitive antenna in the array. The difficulty with this method is that it is seldom convenient to acquire bandpass spectra often enough to ensure sufficiently accurate baseline subtraction on weak sources.

If the total frequency bandwidth in the measurements is covered by using two or more IF bands of the receiving system, it is necessary to correct for differences in their instrumental phase responses. This can be done using the continuum calibrator measurements, by averaging the phase values for the different channels in each IF band and subtracting these averages from the corresponding spectral line visibility data.

Finally, it is necessary to correct for remaining instrumental phases and for the different atmospheric and ionospheric phase shifts, which may be large for widely separated sites. In imaging strong continuum sources, this can be achieved by using phase closure, as described in Sect. 10.3. A similar approach can be used in imaging a distribution of maser point sources, by selecting a strong spectral component that is seen at all baselines and assuming that it represents a single point source. Then if the phase for this component at one arbitrarily chosen antenna is assumed to be zero, the relative phases for the other antennas can be deduced from the fringe phases. Since these phases are attributed to the atmosphere over each antenna, the correction can be applied to all frequency components within the measured spectrum. This method of using one maser component to provide a phase reference is discussed in more detail in Sect. 12.7, together with fringe frequency mapping, a technique that is useful in determining the positions of major components in a large field of masers.

10.5.2 Variation of Spatial Frequency Over the Bandwidth

The effect of using the center frequency of the receiver passband in calculating the values of u and v for all frequencies within the passband is discussed in Sect. 6.3.1 Consider, for example, a single discrete source for which the visibility function has a maximum centered on the (u, v) origin and decreases monotonically for a range of increasing u and v. If we use the frequency at the band center ν 0 to calculate u and v for a frequency at the high end of the band, that is, ν > ν 0, then the values of u and v will be underestimated. The measured visibility will fall off too quickly with u and v, and the central peak of the visibility function will be too narrow. Hence, the width of the image in l and m will be too wide. Thus, if the source radiates a spectral line at the blueshifted side of the bandwidth, the angular dimensions may be overestimated and similarly underestimated at the redshifted side. This effect can be described as chromatic aberration.

As discussed in Sect. 6.3, for observations with a spectral line (multichannel) correlator, the visibility measured for each channel can be expressed as a function of the (u, v) values appropriate for the frequency of the channel. This corrects the chromatic aberration but causes the (u, v) range over which the visibility is measured to increase over the bandwidth in proportion to the frequency. Thus, the width of the synthesized beam (i.e., the angular resolution) and the angular scale of the sidelobes vary over the bandwidth. The variation of the resolution can, if necessary, be corrected by truncation or tapering of the visibility data to reduce the resolution to that of the lowest frequency within the passband.

10.5.3 Accuracy of Spectral Line Measurements

The spectral dynamic range of an image after final calibration is an estimate of the accuracy of the measurement of spectral features expressed as a fraction of the maximum signal amplitude. It can be defined as the variation in the response of different channels to a continuum signal divided by the maximum response, the variation being a result of noise and instrumental errors. When the amplitude of a spectral line is only a few percent of the continuum that is present, as in the case of a recombination line or a weak absorption line, the accuracy of spectral line features depends on the accuracy with which the response to the continuum can be separated from that to the line. In such a case, a dynamic range of order 103 is required to measure a line profile to an accuracy of 10%. Hence, we see the importance of accurate bandpass calibration and of correction for chromatic aberration.

Various techniques have been used to help subtract the continuum response from an image. It is necessary to choose the receiver bandwidth so as to include some channels that contain continuum only, at frequencies on either side of the line features. A straightforward method is to use an average of the line-free channel data to make a continuum image and subtract this image from each of the images derived for a channel with line emission. Unless the receiver bandwidth is sufficiently small compared with the center frequency, it is likely that a correction for chromatic aberration should be used in making the continuum image. If the continuum emanates from point sources, the positions and flux densities of the sources provide a convenient model. For the most precise subtraction, the continuum response should be calculated separately for each line channel, using the individual channel frequencies in determining the (u, v) values. The subtraction should be performed in the visibility data. Use of deconvolution algorithms in the continuum subtraction is briefly discussed in Sect. 11.8.1

10.5.4 Presentation and Analysis of Spectral Line Observations

Spectral line data can be presented as three-dimensional distributions of pixels in (l, m, ν). For physical interpretation, the Doppler shift in the frequency dimension is often converted to radial velocity v r with respect to the rest frequency of the line. The relationship between frequency and velocity is given in Appendix Appendix 10.2. A model of such a three-dimensional distribution is shown in Fig. 10.15. Continuum sources are represented by cylindrical functions of constant cross section in l and m.

Fig. 10.15
figure 15

Three-dimensional representation of spectral line data in right ascension, declination, and frequency. The frequency axis is calibrated in velocity corresponding to the Doppler shift of the rest frequency of the line. The flux density or intensity of the radiation is not shown but could be represented by color or shading. The indicated velocity has no physical meaning for continuum sources, which are represented by cylindrical forms of constant cross section normal to the velocity dimension. Spectral line emission is indicated by the variation of position or intensity with velocity. From Roelfsema (1989), courtesy of and © the Astronomical Society of the Pacific.

The three-dimensional data cube that contains the images for the individual channels can be thought of as representing a line profile for each pixel in two-dimensional (l, m) space. To simplify the ensemble of images, it is often useful to plot a single (l, m) image of some feature of the line profile. This feature might be the integrated intensity

$$\displaystyle{ \varDelta \nu \sum _{i}I_{i}(l,m)\;, }$$
(10.90)

where i indicates the range of spectral channels, which are spaced at intervals Δ ν in frequency. For an optically thin radiating medium such as neutral hydrogen, this is proportional to the column density of radiating atoms or molecules. The intensity-weighted mean velocity is an indicator of large-scale motion,

$$\displaystyle{ \langle v_{r}(l,m)\rangle = \frac{\sum _{i}I_{i}(l,m)v_{r_{i}}} {\sum _{i}I_{i}(l,m)} \;. }$$
(10.91)

The intensity-weighted velocity dispersion

$$\displaystyle{ \sqrt{\frac{\sum _{i } I_{i } (l, m)(v_{r_{i } } -\langle v_{r } \rangle )^{2 } } {\sum _{i}I_{i}(l,m)}} }$$
(10.92)

is an indicator of random motions within the source. The summation in the velocity dimension is performed separately for each (l, m) pixel of the images. In each of the three quantities in expressions (10.90)–(10.92), the intensity values correspond to the specific line of interest, continuum features having been subtracted out. In obtaining the best estimates for these three quantities, it should be noted that including ranges of (l, m, v r ) that contain no discernable emission only adds noise to the results.

Exploring the relationships between three-dimensional images in (l, m, v r ) and the three-dimensional distribution of the radiating material is an astronomical concern. As a simple example, consider a spherical shell of radiating material. If the material is at rest, it will appear in (l, m, v r ) space as a circular disk in the plane of zero velocity, with brightening at the outer edge. If the shell is expanding with the same velocity in all directions, it will appear in (l, m, v r ) space as a hollow ellipsoidal shell. Interpretation of observations of rotating spiral galaxies is more complex. An example of a model galaxy is given by Roelfsema (1989), and a more extensive discussion can be found in Burton (1988).

10.6 Miscellaneous Considerations

10.6.1 Interpretation of Measured Intensity

The quantity measured in a synthesized image is the radio intensity, but \(\mathcal{V}\) is usually calibrated in terms of the equivalent flux density of a point source, and the intensity unit in the resulting image is in units of flux density per beam area Ω 0, which is given by

$$\displaystyle{ \varOmega _{0} =\int \int _{\mathrm{{ main \atop lobe} }} \frac{b_{0}(l,m)\,dl\,dm} {\sqrt{1 - l^{2 } - m^{2}}}\;. }$$
(10.93)

The response to an extended source is the convolution of the sky intensity I(l, m) with the synthesized beam b 0(l, m). Note that since there is often no measured visibility value at the (u, v) origin, the integral of b 0(l, m) over all angles is zero; that is to say, there is no response to a uniform level of intensity. At any point on the extended source where the intensity varies slowly compared with the width of the synthesized beam, the convolution with b 0(l, m) results in a flux density that is approximately I Ω 0. Thus, the scale of the image can also be interpreted as intensity measured in units of flux density per beam area Ω 0. For a discussion of imaging wide sources and measuring the intensity of extended components of low spatial frequency, see Sects. 11.5 and 10.4.

10.6.2 Ghost Images

Figure 10.14 illustrates how bandpass ripples are introduced into the visibility as a function of frequency, as a result of the sharp edges in the cross power spectrum. A related effect discussed by Bos (1984) is the introduction of “ghost” images into the image derived from the observations. The ghost structure appears at a position that, relative to the true structure, is diametrically opposite with respect to the field center. For each spectral channel, the amplitude of the ghost structure is proportional to the amplitude of the ripple component. Thus, it is most serious for the channels at the edges of the receiver passband, as can be seen from Fig. 10.14b.

The ghost phenomenon is most easily explained by considering a simple example. Suppose there is a point source of unit amplitude at position (l, m) = (l 1, 0), where (0, 0) is the field center, and it is observed over a range of baselines u. The fringe visibility of a point source is the Fourier transformFootnote 4 with respect to l of a delta function at l 1, which is

$$\displaystyle{ \mathcal{V}_{1}(u) = e^{-j2\pi ul_{1} } =\cos (2\pi ul_{1}) - j\sin (2\pi ul_{1})\;. }$$
(10.94)

Suppose that a multichannel spectral correlator is used and there is a visibility data set for each spectral channel. The ripples across the spectrum in Fig. 10.14 have the effect that the relative amplitudes of the sine and cosine components are no longer equal, as they are in Eq. (10.94), so we rewrite Eq. (10.94) as

$$\displaystyle{ \mathcal{V}_{1}(u) =\cos (2\pi ul_{1}) - j(1+\varDelta )\sin (2\pi ul_{1})\;. }$$
(10.95)

Here, a component of relative amplitude Δ has been added to the imaginary component, which has the most severe ripples. Δ is positive for a channel in which there is a peak in the imaginary-component ripple. To determine the effect of the term − j Δsin(2π u l 1) in the image, we take its Fourier transform with respect to u, which is Δ[δ(u + l 1) −δ(ul 1)]∕2. Thus, the ripple adds to the image a delta function of amplitude Δ∕2 at − l 1, which is the ghost, and subtracts a delta function of the same amplitude from the true imageFootnote 5 at l 1. For a source at the field center, the ghost and the true image combine, providing a correct measure of the source intensity.

Since the visibility data are usually not calibrated prior to the spectral filtering, the relative amplitudes of the real and imaginary components in Eq. (10.94) result from the instrumental phases introduced by the receiving system as well as from the structure of the source. If these instrumental phase data are lost after calibration of the visibility, precise removal of the ghost is not possible. However, the effect of the ripples can be reduced by use of smoothing functions on the spectral data before creating the image, as discussed earlier. If the spectral data are averaged to provide a continuum result before assigning (u, v) values, the effect of the frequency difference of the channels with high amplitude ripples at the two edges of the passband may be sufficient to separate the ghost into two components, as shown by Bos (1985). This separation will not occur if the (u, v) values are individually assigned for each spectral channel.

Bos (1984) points out that the ghost can be removed, or substantially attenuated, by π∕2 switching of the relative phase between each signal pair before cross correlating, and restoring the phase before transformation of the visibility data to form an image. For the source considered in Eq. (10.94), the introduction of π∕2 into the differential phase for an antenna pair results in the visibility

$$\displaystyle{ \mathcal{V}_{2}(u) = je^{-j2\pi ul_{1} } = j\cos (2\pi ul_{1}) +\sin (2\pi ul_{1})\;. }$$
(10.96)

The imaginary part consists of the cosine components, which are the real part in Eq. (10.94). Adding the visibility term resulting from the ripples in the imaginary part of the spectrum, as in Eq. (10.95), we have

$$\displaystyle{ \mathcal{V}_{2}(u) = je^{-j2\pi ul_{1} } = j(1+\varDelta )\cos (2\pi ul_{1}) +\sin (2\pi ul_{1})\;. }$$
(10.97)

To remove the effect of the quadrature phase switch, we multiply Eq. (10.96) by j. The visibility term introduced by the ripple then becomes −Δcos(2π u l 1), and taking the Fourier transform with respect to u, we find that the contribution of the ripple to the image is −Δ[δ(u + l 1) +δ(ul 1)]∕2. Again, there are delta functions at ± l 1, but in this case, they both have the same sign. Thus, the result of averaging the images with the two positions of the phase switch is to cancel the ghost but double the amplitude loss of the true image. Note that we have assumed that the quadrature phase shift introduced by the switch can be represented by the factor j in Eq. (10.96): If the sign of the phase shift is such that the factor is − j, then the sign of the right side of Eq. (10.96) must be reversed. If the sign is wrong, the effect is to double the amplitude of the ghost but restore the amplitude of the image.

10.6.3 Errors in Images

A very useful technique for investigating suspicious or unusual features in any synthesized image or continuum or spectral line is to compute an inverse Fourier transform (i.e., from intensity to visibility), including only the feature in question. A distribution in the (u, v) plane concentrated in a single baseline, or in a series of baselines with a common antenna, could indicate an instrumental problem. A distribution corresponding to a particular range of hour angle of the source could indicate the occurrence of sporadic interference.

An aid in identifying erroneous features is a familiarity with the behavior of functions under Fourier transformation; see, for example, Bracewell (2000) and the discussion by Ekers (1999). A persistent error in one antenna pair will, for an east–west spacing, be distributed along an elliptical ring centered on the (u, v) origin, and in the (l, m) plane will give rise to an elliptical feature with a radial profile in the form of the zero-order Bessel function. An error of short duration on one baseline introduces two delta functions representing the measurement and its conjugate. In the image, these produce a sinusoidal corrugation over the (l, m) plane. The amplitude in the image plane may be only small, since in an M × N visibility matrix, the effect of the two erroneous points is diluted by a factor of 2(MN)−1, which is usually of order 10−3–10−6. Thus, a single short-duration error could be acceptable if, in the image plane, it is small compared with the noise.

Errors of an additive nature combine by addition with the true visibility values. In the image, the Fourier transform of the error distribution ɛ add(u, v) is added to the intensity distribution, and we have

$$\displaystyle{ \mathcal{V}(u,v) +\varepsilon _{\mathrm{add}}(u,v)\longleftrightarrow I(l,m) + \overline{\varepsilon }_{\mathrm{add}}(l,m)\;. }$$
(10.98)

Other types of additive errors result from interference, cross coupling of system noise between antennas, and correlator offset errors. The Sun is many orders of magnitude stronger than most radio sources and can produce interference of a different character from that of terrestrial sources because of its diurnal motion. The response to the Sun is governed mainly by the sidelobes of the primary beam, the difference in fringe frequencies for the Sun and the target source, and the bandwidth and visibility averaging effects. Solar interference is most severe for low-resolution arrays with narrow bandwidths. Cross coupling of noise (cross talk) occurs only between closely spaced antennas and is most severe for low elevation angles when shadowing of antennas may occur.

A second class of errors comprises those that combine with the visibility in a multiplicative manner, and for these, we can write

$$\displaystyle{ \mathcal{V}(u,v)\varepsilon _{\mathrm{mul}}(u,v)\longleftrightarrow I(l,m) {\ast}{\ast}\,\overline{\varepsilon }_{\mathrm{mul}}(l,m)\;. }$$
(10.99)

The Fourier transform of the error distribution is convolved with the intensity distribution, and the resulting distortion produces erroneous structure connected with the main features in the image. In contrast, the distribution of errors of the additive type is unrelated to the true intensity pattern. Multiplicative errors mainly involve the gain constants of the antennas and result from calibration errors, including antenna pointing and, in the case of VLBI systems, radio interference (see Sect. 16.4).

Distortions that increase with distance from the center of the image constitute a third category of errors. These include the effects of noncoplanar baselines (Sect. 11.7), bandwidth (Sect. 6.3), and visibility averaging (Sect. 6.4), which are predictable and therefore somewhat different in nature from the other distortions mentioned above.

10.6.4 Hints on Planning and Reduction of Observations

Making the best use of synthesis arrays and similar instruments requires an empirical approach in some areas, and the best procedures for analyzing data are often gained by experience. Much helpful information exists in the handbooks on specific instruments, symposium proceedings, etc. [see, for example, Perley, Schwab, and Bridle (1989) and Taylor, Carilli, and Perley (1999)]. A few points are discussed below.

In choosing the observing bandwidth for continuum observations, the radial smearing effect should be considered, since the SNR for a point source near the edge of the field is not necessarily maximized by maximizing the bandwidth. Then in choosing the data-averaging time, the resulting circumferential smearing can be about equal to the radial effect. The required condition is obtained from Eqs. (6.75) and (6.80) and for high declinations is

$$\displaystyle{ \frac{\varDelta \nu } {\nu _{0}} \simeq \omega _{e}\tau _{a}\;. }$$
(10.100)

Here, ν 0 is the center frequency of the observing band, Δ ν is the bandwidth, ω e is the Earth’s rotation velocity, and τ a is the averaging time. When attempting to detect a weak source of measurable angular diameter, or an extended emission, it is important not to choose an angular resolution that is too high. The SNR for an extended source is approximately proportional to I Ω 0, as discussed in the previous section. The observing time required to obtain a given SNR is proportional to Ω 0 −2, or to θ b −4, where θ b is the synthesized beamwidth.

If the antenna beam contains a source that is much stronger than the features to be studied, the response to the strong source can be subtracted, provided it is a point source or one that can be accurately modeled. This is best done by subtracting the computed visibility before gridding the measurements for the FFT. The subtracted response will then accurately include the effect of the sidelobes of the synthesized beam. Nevertheless, the precision of the operation will be reduced if the source response is significantly affected by bandwidth, visibility averaging, and similar effects, so it may be best to place the source to be subtracted at the center of the field. When observing a very weak source, it may be advisable to place the source a few beamwidths away from the (l, m) origin to avoid confusion with residual errors from correlator offsets, etc.

As part of the procedure in making any image, it may be useful also to make a low-resolution image covering the entire area of the primary antenna beam. For this image, the data can be heavily tapered in the (u, v) plane to reduce the resolution and thus also the computation. Such an image will reveal any sources outside the field of the final image that may introduce aliased responses in the FFT. Aliasing of these sources can be suppressed by subtraction of their visibility or use of a suitable convolving function. The sidelobe or ringlobe responses to such a source are also eliminated by subtraction of the source but not by convolution in the (u, v) plane. The low-resolution image will also emphasize any extended low-intensity features that might otherwise be overlooked.

10.7 Observations of Cosmological Fine Structure

10.7.1 Cosmic Microwave Background

The anisotropy of the cosmic microwave background (CMB), which is about 10−5 of the mean temperature of 2.7 K, was first detected by the COBE mission (Smoot et al. 1992), and its characteristics were explored in great detail by the WMAP mission (Bennett et al. 2003) and the Planck mission (Planck Collaboration 2016). The data from these missions were obtained using total-power beam-switching techniques, revealed a major peak in the angular spectrum of the background fluctuations at ∼ 1. 6. Interferometry offers advantages for the study of the higher-resolution peaks that, like the major peak, are attributed to acoustic waves in the early photon-baryon plasma at the surface of last scattering. Since interferometers do not respond to uncorrelated signals such as those generated within the Earth’s atmosphere, it is possible to use ground-based interferometers for investigation of the finer angular structure of the CMB. A number of special instruments have been developed specifically to cover structure of angular range ∼ 0. 1 to ∼ 3. These include the Degree Angular Scale Interferometer (DASI) (Leitch et al. 2002b; Pryke et al. 2002), located at the South Pole; Cosmic Background Imager (CBI) (Padin et al. 2002; Readhead et al. 2004), located at Llano de Chajnantor, Chile; and the Very Small Array (VSA) (Watson et al. 2003; Scott et al. 2003), in Tenerife. Planar arrays, discussed in Sect. 5.6.5, were primarily used for this work.

In the study of the fluctuations in the CMB, it is the statistics of the temperature variations rather than images of specific fields on the sky that are of interest for comparison with theoretical models. Model power spectra are given in terms of spherical harmonics, that is, the amplitudes of multipole moments of the temperature variation. Measurements of the angular spectrum of the CMB in this form can be derived directly from the Fourier components measured by interferometry without forming images of the structure on the sky. It is assumed that the CMB spectrum can be expressed as a function with circular symmetry (rotational invariance), since there is no preferred direction in the structure on the sky. Thus, characteristics of the CMB lead to some design considerations that differ from those for general-purpose synthesis arrays. The individual antennas need to be large enough to allow accurate phase and amplitude calibration with observing times of a few minutes, using strong discrete sources. With regard to the antenna configuration, the main requirement is to obtain sampling in a radial coordinate, \(q = \sqrt{u^{2 } + v^{2}}\), in the (u, v) plane, rather than uniform sampling in two dimensions, as required for imaging. To obtain sufficiently fine sampling in q, the antennas were usually configured so that, considered pairwise, the spacing between centers from the closest to the most widely spaced increases in increments that are smaller than the diameter of an antenna. This can be achieved, for example, by the curved arm configuration shown for the CBI in Fig. 5.24

In CMB measurements, it is also essential to be able to separate out the effects of all foreground sources. These signals can be identified by their spectral characteristics, which, for synchrotron or optically thin thermal emissions, differ from the blackbody spectrum of the CMB. Another requirement for CMB interferometry is sufficient frequency coverage to allow the spectral characteristics of signals to be determined. All three of the systems mentioned above used 10 GHz-wide receiving bandwidths of 26–36 GHz, subdivided into channels. These frequencies were chosen to be high enough to take advantage of the increase of CMB flux density with frequency and also to avoid H2O and O2 atmospheric absorption lines.

DASI was designed to provide measurements over a range of multipole moments  = 100–900 and used 13 antenna of diameter 20 cm with baselines 0.25–1.21 m. For CBI, the range of is 400 − 4250, and 13 antennas of diameter 90 cm with a range of baselines 1–5.51 m were used. Each array was small enough to allow the antennas to be mounted on a mechanically rigid faceplate that could be pointed in azimuth and altitude so that the normal would track the center of the field under observation. The faceplate could also be rotated about its axis, to control the parallactic angle of the interferometer fringe patterns on the sky. No delay system or fringe rotation was needed, but phase switching was included to remove instrumental offsets. In CBI and DASI, the antennas were arranged in patterns with threefold symmetry, and thus, a rotation of the faceplate through 120 caused the configuration of the antennas to repeat relative to the sky (see Fig. 5.24). This property was very useful since the response to the sky remains unchanged after such a rotation, and variations in the signals resulting from unwanted effects such as residual cross talk between antennas could be identified and removed.

A further problem at the high levels of sensitivity required to observe the CMB structure results from thermal radiation from the ground and nearby objects, incident through the antenna sidelobes. This can introduce a serious unwanted contribution in the responses of the more closely spaced antenna pairs, but the effect decreases with increasing antenna spacing. For analysis of the results of observations of this type, see Hobson et al. (1995) and White et al. (1999). Further details of observations can be found in Leitch et al. (2002a,b) and Padin et al. (2002).

10.7.2 Epoch of Reionization

At redshifts corresponding to the period prior to the Epoch of Reionization (EoR), it should be possible to detect radiation of the neutral hydrogen line (1420 MHz rest frequency). As stars were formed in the early Universe, much of the hydrogen became ionized, and this period is referred to as the EoR. This probably occurred at a redshift no higher than about 7 or 8 (Morales and Wyithe 2010). Radiation at the frequency of the neutral hydrogen line should, in principle, be detectable at a redshift corresponding to the beginning of the EoR or earlier and should be detectable in all directions over the sky. However, there is also the cosmic background and the foreground noise from our Galaxy, and the level of these exceeds the distant hydrogen line signal by an estimated factor of 104. For detection of a broad faint background of radiation, in contrast with detection of discrete sources, sensitivity can be increased by using a large number of small antennas, to maximize sensitivity to broad structural features. In the image domain, (l, m), the third variable added is the frequency, ν, and in the spatial frequency domain, (u, v), the corresponding conjugate variable, represents time delay. A basic concern is how redundancy in the array configuration can be chosen to maximize the sensitivity to different angular scales in the search for the reionization signal. Further discussion of the challenges associated with EoR imaging can be found in Parsons et al. (201020122014); Zheng et al. (2013), and Dillon et al. (2015).

10.8 Appendix 10.1 The Edge of the Moon as a Calibration Source

During the test phase of bringing an interferometer into operation, it is useful to observe sources that produce fringes with high SNR. At frequencies above ∼ 100 GHz, there are not many such sources. The Sun, Moon, and planets, the disks of which are resolved by the interferometer fringes, can nevertheless provide significant correlated flux density because of their sharp edges. Consider the limb of the Moon and the case in which the primary beam of the interferometer elements is much smaller than 30, the lunar diameter. When the antenna beam tracks the Moon’s limb, the apparent source distribution is the antenna pattern multiplied by a step function; it is assumed that the brightness temperature of the Moon is constant within the beam. Approximating the antenna pattern as a Gaussian function, assuming that the antennas track a fixed point on the west limb of the Moon, and ignoring the curvature of the lunar limb, we can express the effective source distribution as

$$\displaystyle{ \begin{array}{rll} I(x,y) =&\ I_{0}e^{-4\,(\ln 2)(x^{2}+y^{2})/\theta _{ b}^{2} }\qquad & x \geq 0\;, \\ =&\ 0 &x < 0\;,\end{array} }$$
(A10.1)

where x and y are angular coordinates centered on the beam axis, θ b is the full width of the beam at the half-power level, and in the Rayleigh–Jeans regime, I 0 = 2kT m λ 2, where T m is the temperature of the Moon. The visibility function is then

$$\displaystyle\begin{array}{rcl} \mathcal{V}(u,v)& =& 2I_{0}\left [\int _{0}^{\infty }e^{-4\,(\ln 2)x^{2} /\theta _{b}^{2} }(\cos 2\pi ux - j\sin 2\pi ux)\,dx\right ] \\ & & \times \left [\int _{0}^{\infty }e^{-4\,(\ln 2)y^{2}/\theta _{ b}^{2} }\cos 2\pi vy\,dy\right ]\;. {}\end{array}$$
(A10.2)

The cosine integral is straightforward, and the sine integral can be written in terms of a degenerate hypergeometric function1 F 1 (see Gradshteyn and Ryzhik 1994, Eq. 3.896.3). The result is

$$\displaystyle\begin{array}{rcl} \mathcal{V}(u,v) = I_{0}S_{0}e^{-\pi ^{2}\theta _{ b}^{2}(u^{2}+v^{2})/4\,\ln \,2 }\left [1 - j\sqrt{ \frac{\pi } {\ln 2}}(\theta _{b}u)\,_{1}F_{1}\left (\frac{1} {2}, \frac{3} {2}, \frac{\pi ^{2}\theta _{b}^{2}u^{2}} {4\,\ln \,2} \right )\right ]\;,.& &{}\end{array}$$
(A10.3)

where

$$\displaystyle{ S_{0} = \frac{\pi k\tau _{m}\theta _{b}^{2}} {4\lambda ^{2}\ln 2} }$$
(A10.4)

is the flux density of the Moon in the half-Gaussian beam. In the limit (u, v) ≫ (0, 0), the imaginary part of the visibility is zero, and \(\mathcal{V}(u,v) = S_{0}\), as expected. For T m  = 200 K and θ b  = 1. 2λd, where d is the diameter of the interferometer antennas in meters, S 0 ≃ 460, 000∕d 2 Jy. The integral over x in Eq. (A10.2) can also be written in terms of the error function. For the limit where u ≫ dλ, the asymptotic expansion of the error function leads to the convenient approximation

$$\displaystyle{ \mathcal{V}(u,v = 0) = j\sqrt{\frac{4\ln 2} {\pi ^{3}}} \frac{S_{0}} {\theta _{b}u} \simeq j\,0.41\frac{kT_{m}} {dD} \;, }$$
(A10.5)

where D is the baseline length. Hence, we have the interesting situation that the visibility for a given baseline length increases as the antenna diameter decreases, as long as θ b  ≪ 30. The approximation in Eq. (A10.5) is accurate to 2% for D > 2d. The full visibility function as a function of projected baseline length is shown in Fig.A10.1. Note that the visibility measured with an interferometer having an east–west baseline orientation and tracking the north or south limb of the Moon will be essentially zero. In the general case, the maximum fringe visibility is obtained by tracking the limb of the Moon that is perpendicular to the baseline.

Fig. A10.1
figure 16

Normalized fringe visibility for an interferometer with an east–west baseline observing the west limb of the Moon at transit (v = 0), vs. θ b u. θ b  ≃ 1. 2λd is the half-power beamwidth of the antenna, d is the antenna diameter, and u = Dλ is the baseline in wavelengths. On the horizontal axis, θ b u is approximately equal to 1. 2Dd. The dotted line is the imaginary component of visibility, the dashed line is the real part, and the solid line is the magnitude. Since the portion of the curve for Dd < 1 is not accessible, the measured visibility is almost purely imaginary. For d = 6 m and Dd = 3, the zero-spacing flux density [see Eq. (A10.4)] is 12,700 Jy, and the visibility is about 1000 Jy [see Eq. (A10.5)]. Adapted from Gurwell (1998).

Although the Moon may produce strong fringes, it is not an ideal calibration source. First, libration may make it difficult to track the exact edge of the Moon. Second, because the apparent source distribution is determined by the antennas, tracking errors introduce amplitude and phase fluctuations. Third, because the temperature of the Moon depends on solar illumination, variations around the mean temperature of ∼ 200 K are significant, especially at short wavelengths. For accurate results, the lunar temperature variation should be incorporated into the brightness temperature model.

10.9 Appendix 10.2 Doppler Shift of Spectral Lines

The Doppler shift [e.g., Rybicki and Lightman (1979)] is given by the relation

$$\displaystyle{ \frac{\lambda } {\lambda _{0}} = \frac{\nu _{0}} {\nu } = \frac{1 + \frac{v} {c}\cos \theta } {\sqrt{1 - \left (\frac{v} {c}\right )^{2}}}\;, }$$
(A10.6)

where λ 0 and ν 0 are the rest wavelength and frequency as measured in the reference frame of the source, the corresponding unsubscripted variables are the wavelength and frequency in the observer’s frame, v is the magnitude of the relative velocity between the source and the observer, and θ is the angle between the velocity vector and the line-of-sight direction between source and observer in the observer’s frame (θ < 90 for a receding source). The numerator in Eq. (A10.6) is the classical Doppler shift caused by the change in distance between the source and the observer. The denominator is the relativistic time dilation factor, which takes account of the difference between the period of the radiated wave as measured in the rest frame of the source and the rest frame of the observer.

Because of the time dilation effect, there will be a second-order Doppler shift even if the motion is transverse to the line of sight. For the rest of this discussion, we consider only radial velocities; that is, θ = 0 or 180. In this case, the Doppler shift equation is

$$\displaystyle{ \frac{\lambda } {\lambda _{0}} = \frac{\nu _{0}} {\nu } = \sqrt{\frac{1 + \frac{v_{r } } {c} } {1 -\frac{v_{r}} {c} }}\;, }$$
(A10.7)

where v r is the radial velocity (positive for recession). Solving for velocity, we obtain

$$\displaystyle{ \frac{v_{r}} {c} = \frac{\nu _{0}^{2} -\nu ^{2}} {\nu _{0}^{2} +\nu ^{2}}\;, }$$
(A10.8)

or

$$\displaystyle{ \frac{v_{r}} {c} = \frac{\lambda ^{2} -\lambda _{0}^{2}} {\lambda ^{2} +\lambda _{ 0}^{2}}\;. }$$
(A10.9)

Taylor expansions of Eqs. (A10.8) and (A10.9) yield

$$\displaystyle{ \frac{v_{r}} {c} \simeq -\frac{\varDelta \nu } {\nu _{0}} + \frac{1} {2} \frac{\varDelta \nu ^{2}} {\nu _{0}^{2}}\ \cdots }$$
(A10.10)

and

$$\displaystyle{ \frac{v_{r}} {c} \simeq \frac{\varDelta \lambda } {\lambda _{0}} -\frac{1} {2} \frac{\varDelta \lambda ^{2}} {\lambda _{0}^{2}}\ \cdots \;, }$$
(A10.11)

where Δ ν = νν 0 and Δ λ = λλ 0. For negative Δ ν, the velocity is positive and the signal is “redshifted.” Since Δ νν 0 ≃ −Δ λλ 0, the second-order terms have approximately the same magnitude but opposite signs in Eqs. (A10.10) and (A10.11).

Devices for spectroscopy at radio and optical frequencies usually produce data that are uniformly spaced in frequency and wavelength, respectively. Hence, to first order, the velocity axis can be calculated as a linear transformation of the frequency or wavelength axes. Unfortunately, this has led to two different approximations of the velocity:

$$\displaystyle{ \frac{v_{r}{}_{\mathrm{radio}}} {c} = -\frac{\varDelta \nu } {\nu _{0}} }$$
(A10.12)

and

$$\displaystyle{ \frac{v_{r}{}_{\mathrm{optical}}} {c} = \frac{\varDelta \lambda } {\lambda _{0}}\;. }$$
(A10.13)

The difference between these two approximations can be appreciated by noting that v r radioc = −Δ λλ. Each velocity scale produces a second-order error in its estimation of the true velocity; that is, the radio definition underestimates the velocity, and the optical definition overestimates the velocity by the same amount. The difference in velocity between the scales as a function of velocity is

$$\displaystyle{ \delta v_{r} = v_{r}{}_{\mathrm{optical}} - v_{r}{}_{\mathrm{radio}} \simeq \frac{v_{r}^{2}} {c} \;. }$$
(A10.14)

Hence, the identification of the velocity scale used is very important for extragalactic sources. For example, if v r  = 10, 000 km s−1, δ v r  ≃ 330 km s−1. Failure to recognize the difference between the velocity conventions can cause considerable problems when observations are made with narrow bandwidth.

To interpret the velocities of spectral lines, it is necessary to refer them to an appropriate inertial frame. The rotation velocity of an observer at the equator about the Earth’s center is about 0.5 km s−1; the velocity of the Earth around the Sun is about 30 km s−1; the velocity of the Sun with respect to the nearby stars is about 20 km s−1 [this defines the local standard of rest (LSR)]; the velocity of the LSR around the center of the Galaxy is about 220 km s−1; the velocity of our Galaxy with respect to the local group is about 310 km s−1; and the velocity of the local group with respect to the CMB radiation is about 630 km s−1. The most accurate reference frame beyond the solar system is defined with respect to the CMB. The velocity of the Sun with respect to the CMB has been determined from measurements of the dipole anisotropy of the CMB (v = cT dipoleT CMB, where T dipole = 3364. 3 ± 1. 5 μK and T CMB = 2. 7255 ± 0. 0006 K), which yields the remarkably precise result of 370. 1 ± 0. 1 km s−1 toward  = 263. 91± 0. 02 and b = 48. 265± 0. 002 (Planck Collaboration 2016). Information on these various reference frames is listed in Table A10.1. Most observations are reported with respect to either the solar system barycenter or the LSR. Velocities of stars and galaxies are usually given in the former frame, and observations of nonstellar Galactic objects (e.g., molecular clouds) are usually given in the latter frame. Accurate determination of the rotation speed of the Galaxy and its structure depend on precise knowledge of the LSR. Velocity corrections at many radio observatories are based on a program called DOP [Ball (1969); see also Gordon (1976)], which has an accuracy of ∼ 0.01 km s−1 because it does not take planetary perturbations into account. Routines such as CVEL in AIPS are based on this code. Much higher accuracy can be obtained by more sophisticated programs such as the Planetary Ephemeris Program (Ash 1972) or the JPL Ephemeris (Standish and Newhall 1996). Precise comparison of velocity measurements at different observations requires comparison of their dynamical calculations. Interpretation of pulsar timing measurements also requires precise velocity correction.

Table A10.1 Reference frames for spectroscopic observations

There is sometimes confusion in the conversion of baseband frequency to true observed frequency. In the calculation of the spectrum in the baseband by Fourier transformation of either the data stream or the correlation function with the FFT algorithm, the first channel corresponds to zero frequency, and the channel increment is Δ ν IFN, where Δ ν IF is the bandwidth (half the Nyquist sampling rate) and N is the total number of frequency channels. The Nth channel corresponds to frequency Δ ν IF(1 − 1∕N). If N is an even number (N is usually a power of two), channel N∕2 corresponds to the center frequency of the baseband. For a system with only upper-sideband conversions, the sky frequency of the first channel (zero frequency in the baseband) is the sum of the local oscillator frequencies. Note that the velocity axes run in opposite directions (v ∝ −ν and v ∝ ν) for systems with net upper- and lower-sideband conversion, respectively.

There are several velocity shifts of non-Doppler origin that sometimes need to be taken into account. For spectral lines originating in deep potential wells—for example, close to black holes—there is an additional time dilation term

$$\displaystyle{ \gamma _{G} = \frac{1} {\sqrt{1 - \frac{r_{s } } {r}} }\;, }$$
(A10.15)

where r is the distance from the center of the black hole and r s is its Schwarzschild radius (r s  = 2GMc 2), which is valid for r ≫ r s . The total frequency shift [obtained by generalizing Eq. (A10.6)] is therefore

$$\displaystyle{ \frac{\nu _{0}} {\nu } = \left (1 + \frac{v_{r}} {c} \cos \theta \right )\gamma _{L}\gamma _{G}\;, }$$
(A10.16)

where \(\gamma _{L} = 1/\sqrt{1 - v_{r } ^{2 } /c^{2}}\) is known as the Lorentz factor. For example, the radiation from the water masers in NGC 4258 (see Fig. 1.23), which orbit a black hole at a radius of 40,000 r s , undergoes a velocity shift of about 4 km s−1.

The most important non-Doppler frequency shift for sources at cosmological distances is due to the expansion of the Universe. In the relatively nearby Universe, this velocity shift is

$$\displaystyle{ z = \frac{\lambda } {\lambda _{0}} - 1 \simeq \frac{H_{0}\,d} {c} \;, }$$
(A10.17)

where H 0 is the Hubble constant and d is the distance. H 0 is about 70 km s−1 Mpc−1 (Mould et al. 2000). For greater distances (z > 1), the relations between z and the distance and look-back time depend on the cosmological model used [e.g., Peebles (1993)]. However, given the definition of z, the correct frequency will always be related to it by

$$\displaystyle{ \nu = \frac{\nu _{0}} {z + 1}\;. }$$
(A10.18)

Other issues regarding observations of cosmologically distant spectral line sources are discussed by Gordon et al. (1992). An early example of spectroscopic interferometric observations of a molecular cloud at a cosmological distance (z = 3. 9) can be found in Downes et al. (1999).

10.10 Appendix 10.3 Historical Notes

10.10.1 A10.3.1 Images from One-Dimensional Profiles

Early images of the Sun and a few other strong sources were made with linear arrays such as the grating array and compound interferometer shown in Fig. 1.13. The results were obtained in the form of fan-beam scans. With such an instrument, the visibility data sampled at any instant are located on a straight line through the origin in the (u, v) plane, as shown in Fig. 10.1. Fourier transformation of the visibility data sampled along such a line provides a corrugated surface with a profile given by the fan-beam scan, as shown in Fig. A10.2. This can be regarded as one component of a two-dimensional image. As the Earth rotates, the angle of the beam on the sky varies, so addition of these components builds up a two-dimensional image. However, in the fan-beam scans from such arrays, each pair of antennas contributes with equal weight to the profile, so an image built up from profiles in such a manner exhibits the undesirable characteristics of natural weighting. During the 1950s, before digital computers were generally available, the combination of such data to provide two-dimensional images with a desirable weighting was a laborious process. Christiansen and Warburton’s (1955) solar image involved Fourier transformation, weighting, and retransformation of the data by manual calculation. A method of combining fan-beam scans without Fourier transformation was later devised by Bracewell and Riddle (1967) using convolution to adjust the visibility weighting. Basic relationships between one- and two-dimensional responses (Bracewell 1956a) are discussed in Sect. 2.4

Fig. A10.2
figure 17

A surface in the (l, m) domain that is the Fourier transform of visibility data in the (u, v) plane measured along a line making an angle ϕ +π∕2 with the u axis, as shown by the broken line in Fig. 10.1.

10.10.2 A10.3.2 Analog Fourier Transformation

An optical lens can be used as an analog device for Fourier transformation. Analog systems for data processing based on optical, acoustic, or electron-beam processes were investigated in the early years but generally have not proved successful for synthesis imaging. They lacked flexibility, and a further problem was limitation of the dynamic range, which is the ratio of the highest intensity levels to the noise in the image. Maintaining image quality in any iterative process that involves successive Fourier transformation and retransformation of the same data, as occurs in some deconvolution processes (see Chap. 11), requires high precision. Analog possibilities for Fourier transformation were discussed by Cole (1979) but became irrelevant as more powerful computers became available.