Introduction

The measurement of surface energy flux components [net radiation (Rn), sensible heat flux (H), latent heat flux (LE), ground heat flux (G)] and other scalars has long been recognized as central to understanding many atmospheric and ecosystem functions and properties1,2. LE (or evapotranspiration (ET); LE ≡ λET, where λ is the latent heat of vaporization) is especially important as it is a major component of the water cycle in many ecosystems3,4, accounting for more than half of terrestrial insolation and recycling the majority of terrestrial precipitation to the atmosphere4,5, and is often directly and indirectly related to many other soil and canopy variables6,7,8.

Numerous methods have been developed for these measurements and most have been extensively used in field studies9,10,11. There have, however, been limited comparisons that attempt to quantify the differences between them or to provide guidance for best use practices12,13,14,15,16,17,18. Nouri et al.19 reviewed several methods of estimating LE/ET, however only three of them (lysimetry, Bowen ratio, and eddy covariance (EC)) were direct environmental measurements. Similarly, Tang et al.20 compared long term (> 10 years) measurements from two methods considered here (EC and energy balance/Bowen ratio (EBBR)), and found EBBR favoring higher LE than EC; however, while the two instruments were within ~ 100 m, they observed fetch footprints of different vegetation species (mix of grass and crop depending on prevailing wind direction) between the two instrument systems that was postulated to account for, at least in part, the EC vs EBBR differences. However, similar findings were observed in short term measurements from Barr et al.21 with EBBR favoring higher LE than EC over a deciduous forest, supporting the postulate that some of the differences are due to differing measurement method. This finding is further supported by Savage16, that found EC did not underestimate H relative to Bowen ratio or surface layer scintillometry methods over a mixed grassland and inferred that lack of energy balance closure in EC may be in part due to EC underestimating LE.

The generalizability of these studies beyond a narrow range of environmental conditions is limited due to the relatively short comparison periods (with the exceptions of, e.g., Amiro15 and Tang et al.20) and sampling of study specific climate and ecosystem conditions. Typically these comparisons were intended to validate part of a larger study, were only between a limited number of methods, and were not meant to be used as comprehensive evaluations or give a broader perspective of the overall measurement options available. Further, these analyses do not generally consider the comparisons in terms of both long-term and short-term agreement of the methods, with the specific temporal frame dictated by the underlying experiment (Table 1). To better facilitate guidance for new as well as seasoned investigators, a more focused study is desirable.

Table 1 Summary of previous methodology comparisons.

Billesbach and Arkebauer22 reported on direct measurements of evapotranspiration from a Nebraska SandHills grassland ecosystem using a novel approach to the EBBR method. They validated their approach by direct comparison to a co-located EC flux tower. In addition to validation comparison for their study, the co-located EC sensors’ (notably the high frequency sonic anemometer) raw data sets, in conjunction with the individual EBBR sensor measurements, allow pseudo-independent estimation of evapotranspiration using application of alternate methods. Data from the two instrument systems were available over a large portion of the calendar year, allowing a wide range of meteorological and phenological conditions to be sampled.

The experiment herein was designed to quantify H and LE estimated using the raw data sets of Billesbach and Arkebauer22 and four different methods over both short and long time periods. The methods represent a range of instrumental complexity and cost as well as diversity in maintenance and computational requirements. While ideally cost or effort should not be major factors in the design of field experiments, they are almost always considered in practice. Further, experimental design depends on the scientific objectives: e.g., deploying several low-cost, lower-precision instrument systems may be preferred over a single expensive, high-precision system to capture spatial variability; in specific or unique environments one of the approaches presented here may function more reliably, providing a more complete data record; or methods using Bowen ratio or aerodynamic approaches may be preferred to contextualize, evaluate, and improve numerical models that typically rely on flux-gradient based surface parameterizations. While it is recognized that no single-site experiment can provide insight applicable to all possible research scenarios, this work attempts to synthesize a set of existing measurements into a broad, long-term method intercomparison that will add to literature aiming to address some of these information gaps, and provide a common framework for assisting researchers in choosing a method appropriate for their temporal and spatial site needs, and budget constraints.

In addition to the above issues concerning new field experiments, many reanalysis and synthesis studies attempt to combine diverse data sets over both space and time23,24,25. One challenge that often arises in these efforts is reconciling associated uncertainties in measurements from different methodologies20,26. In this paper, measurements of H and LE from the energy balance/Bowen ratio (EBBR), modified Bowen ratio (MBR), and the residual energy (RES) techniques are examined relative to the eddy covariance (EC) method (“Measurement theory and techniques”), and their performances are compared with respect to several environmental and site parameters. The objective of this work is therefore to provide a focused, comprehensive, and long-term comparison of field measurement techniques. The ultimate goal is to provide researchers using archival H and LE data collected by different methods a basis on which comparisons may be evaluated and to provide guidance for the design of new field experiments.

Measurement theory and techniques

There are a number of methods to measure H and LE, each carries its own advantages and drawbacks. In this work, four were examined. All involve (in some way) the balance of surface energy flux which may be written as27:

$$H+LE={R}_{n}-G-"other\,components"$$
(1)

where H (W m−2) is sensible heat flux, LE (W m−2) is latent heat flux (or evapotranspiration, ET, where LE ≡ λET), Rn (W m−2) is net radiation, G (W m−2) is soil heat flux, and “other components” (W m−2) are other dissipative terms which may or may not be directly measured or estimated (e.g. photosynthesis, soil heat storage, canopy heat storage, etc.). This work will focus exclusively on obtaining the H and LE terms from field measurements, and a complete, thorough analysis of the right-hand side of Eq. (1) is beyond the scope of this paper. Nevertheless, because of the time scales involved and the various neglected terms, it is often unclear whether or not the ecosystems under consideration are “closed” thermodynamic systems and whether exact energy balance should be expected over the time frames of the measurements28. It is noteworthy that several experimental methods explicitly assume Eq. (1) holds exactly in all cases and at all time-scales which could lead to errors, biases, and/or misinterpretation of measurements29.

Energy and trace gases are assumed to be primarily transported by turbulent mixing. To fully capture the flux of any of these quantities, instruments must either sample fast enough to capture the smallest eddies, or use a technique that allows slower sensors to measure the average of the fast eddies. In addition, all measurements must be averaged over a long enough time period to capture the contributions of larger eddies, but not so long that non-stationary phenomena (e.g., fronts, orographic flow, advection, diel or other secondary circulations) affect the results. In general, measurements made between 10 and 20 Hz are sufficient for small eddies, and instruments with time responses of 30 s to 1 min are adequate for slower averaging methods. For all methods, averaging periods of 20 min to 30 min are usually adequate, but care must be taken in special circumstances30,31,32.

All of the methods to be discussed have unique advantages and one may be more appropriate than another for a given site, project, or budget. Some methods will require more expensive, complex, and power-hungry infrastructure, and some will require more intensive maintenance (Supplementary Table S6).

Eddy covariance (EC)

The eddy covariance (EC) method is widely considered to be the most accurate and precise method of estimating turbulent fluxes30,33,34,35. It is the only method (considered here) that makes direct and independent measurements of H and LE36,37. The EC method has been well documented and is perhaps the most commonly used (as of this writing) technology for measuring H and LE30,35 within several flux networks such as AmeriFlux38, Integrated Carbon Observation System (ICOS)39, National Ecological Observatory Network (NEON)40, and the Atmospheric Radiation Measurement (ARM) user facility23,41,42,43, amongst others1,44.

EC flux estimates require fast (10 or more Hz) measurements of the vertical component of wind speed and scalar of interest (air temperature and water vapor density in this case). The turbulent flux is then calculated as the mean covariance of the fluctuations of the vertical wind speed with fluctuations of water vapor density, air temperature, or other scalar. H (W m-2), is calculated from the high frequency data as:

$$H={C}_{p}\rho \overline{w{\prime}T{\prime}}$$
(2)

where Cp is the specific heat of air at constant pressure (J kg−1 K−1), ρ is the density of air (kg m−3), w′ is the instantaneous fluctuation of the vertical wind speed component about the mean (m s−1), T′ is the instantaneous fluctuation of temperature about the mean (K), and the overbar represents a time average operator. LE (W m−2), is calculated from the high frequency data as:

$$LE=\lambda \overline{w{\prime}{\rho {\prime}}_{V}}$$
(3)

where λ is the latent heat of vaporization of water (or of sublimation of ice in frozen conditions) (J kg−1) and ρ′V is the instantaneous fluctuation of water vapor density about the mean (kg m−3).

One reason this method is considered the most direct and accurate is that the basic relationships used by eddy covariance come directly from the continuity equation for hydrostatic systems, which states that the rates with which mass enters and leaves a system are equal in magnitude and of opposite sign30. Additionally, few other assumptions are needed, such as no air density fluctuations, null mean vertical wind speed, steady state condition over averaging interval, and well-developed turbulence. Another reason is that both H and LE are measured independently of any dissipative terms and energy inputs. Any errors or omissions from the right hand side of Eq. (1) will thus have no influence on the values of H and LE.

EC instrumentation is, however, relatively expensive, requires significant investments of time and effort to operate and maintain, and the raw data post processing is more complex. These issues are mostly a direct consequence of the requirements for fast, precise wind speed, gas density, and air temperature measurements. Finally, besides the basic covariance, there are a number of additional terms and corrections that must be considered to produce what are considered the most accurate and reliable results30,45,46,47,48,49.

Energy balance/Bowen ratio (EBBR)

Another common method of measuring ET is the energy balance/Bowen ratio (EBBR) or Bowen ratio/energy balance (BREB) method12,34,42,50. This technique is one of several that are related to the gradient flux method, relying on the assumption that mixing of the atmosphere is not instantaneous, and that concentration and air temperature gradients are well characterized and can be parameterized in terms of atmospheric stability and the transfer of energy and trace gases. In the EBBR method, as in the gradient flux method34, it is assumed that turbulent exchange coefficients for air temperature and water vapor pressure (for sensible and latent heat fluxes, respectively) are identical.

The basic instrumentation for this method is relatively simple and inexpensive compared to EC systems. The main components typically consist of two thermometers, vertically separated by about 1 to 2 m and two humidity sensors collocated with the thermometers. Care must be exercised in locating the two measurement heights: if the separation between levels is too great, there is risk that the footprints51 “seen” by the two sensor pairs will differ significantly and spatial heterogeneity may become a problem23; if the separation is too small, the air temperature and humidity gradients may become too small to measure accurately (especially under low flux conditions). Because the EBBR method does not require fast measurements (averages of several minutes or more are often used), fast sensor response is not a requirement, allowing the use of simpler, less expensive measurement technology. While thermocouples and thermistors can be used for air temperature measurements, platinum resistance thermometers (PRT) are often used because of their high stability and linearity over wide air temperature ranges and are often regarded as primary temperature measurement standards. Many different types of humidity instruments have been used in this application including wet/dry bulb thermometry; however modern systems often use solid state relative humidity sensors or optical humidity sensors (krypton hygrometers, infrared gas analyzers, etc.).

Because the primary information used in EBBR flux estimations are air temperature and humidity gradients, efforts must be made to either carefully characterize or avoid sensor biases. A common technique to average out slow bias or calibration drifts is to mount air temperature and humidity sensors (or sample inlets) on a mechanism that periodically exchanges their positions several times during a measurement period, with any sensor biases assumed to average out43. This extra complexity may be avoided in some situations by careful inter-calibration of the sensors (see "Energy balance/Bowen ratio (EBBR) calibration").

The air temperature and humidity gradient measurements allow direct calculation of the Bowen ratio (β):

$$\beta =\frac{{C}_{P}\rho }{\lambda }\frac{\overline{\Delta T} }{\overline{{\Delta \rho }_{V}}}$$
(4)

where Cp is the specific heat of air (J kg−1 K−1), ρ is the density of air (kg m−3), λ is the latent heat of vaporization of water (or the latent heat of sublimation for frozen conditions) (J kg−1), \(\overline{\Delta T }\) is the mean temperature difference between upper and lower sensors (K), and \(\overline{\Delta {\rho }_{V}}\) is the mean difference in water vapor densities between the upper and lower sensors (kg m−3).

To complete the flux relationships, the available energy (the right hand side of Eq. 1) is measured with a net radiometer, soil heat flux plates, and (often) soil thermometers and moisture sensors to estimate energy storage in the soil layer above the heat flux plates43. The Bowen ratio is defined as the ratio of sensible to latent heat flux:

$$\beta =\frac{H}{LE}$$
(5)

and Eqs. (1), (4), and (5) can be combined to solve for H and LE.

One of the important sources of uncertainty in these measurements is the available energy (net radiation – soil heat flux). Great care must be exercised in the placement of the net radiometer to ensure that its field of view is representative of the entire fetch. In addition to careful measurement of the soil heat flux, any other “dissipative” energy terms such as canopy energy storage must be accounted for. Errors in the estimation of these terms will have direct impacts on values of H and LE since the EBBR method assumes perfect energy balance.

Since the EBBR method does not require fast measurements, simpler instruments are often employed along with simpler data logging solutions. Thus, in addition to lower cost, EBBR methods usually need lower system power requirements (compared to EC), and acquire smaller raw data sets.

Modified Bowen ratio (MBR)

A third method is the Modified Bowen Ratio (MBR) method as described by Liu and Foken13. This scheme uses a fast response sonic anemometer/thermometer and a pair of air temperature and humidity sensors (deployed as in the EBBR method) as the primary instruments. H is independently calculated from sonic anemometer data, following the normal EC procedure. The Bowen ratio is calculated using data obtained from the pair of air temperature and humidity sensors (Eq. 4). From these measurements, Eq. (5) is directly solved for LE, eliminating the need to independently measure the available energy. This approach has the added benefit that it removes uncertainties associated with net radiation, soil heat flux, and residual dissipative terms. However, this comes at a cost in computational complexity, requiring both traditional EC processing to obtain H and traditional EBBR processing to obtain the Bowen ratio. Compared to the previously discussed methods, the expense, power, and data volume requirements fall between the EC and EBBR methods.

As previously mentioned for EBBR, because the air temperature and humidity sensors must be vertically separated to measure the relevant gradients, they will measure quantities from slightly different footprints34. These may also differ from the footprint seen by the sonic anemometer if it is also located at a different height. For large, homogeneous sites, this is only a minor concern, but at smaller or highly heterogeneous locations it could be problematic51.

Residual energy (RES)

The final method that was considered is the Residual Energy (RES) method52,53,54. As with MBR, in this technique, H is again measured with a sonic anemometer/thermometer. Unlike the MBR method, however, the Bowen ratio is not measured; rather, the available energy is calculated (as in the EBBR method) from net radiation, soil heat flux, and other dissipative energy flux terms (if available). Equation (1) is then solved for LE. As is the case for the EBBR method, errors in estimation of available energy will impact LE. The costs, infrastructure complexity, and data volume are similar to those of the MBR method.

Methods

Experiment

The data used in these analyses were collected during 2007 at the AmeriFlux US-Sdh (https://doi.org/10.17190/AMF/1246136) site in the Nebraska SandHills22. The flux tower is located at the University of Nebraska’s Gudmundsen SandHills Research Laboratory (GSRL) near Whitman, NE (42°04′ N, 101°28′ W, 1085 m elev.). This facility is a 5200 ha research ranch that encompasses several different ecosystems. The flux towers used in this study are located at what is termed a “dry valley”. In this ecosystem, the water table lies 1 to 10 m below the land surface, but the vegetation usually has ready access to moisture in the vadose zone. The towers both have a clear fetch of over 350 m in all directions with no appreciable slope. The average annual canopy height is between 100 and 200 mm, and consists predominantly of C3 and C4 prairie grasses. Instrumental, power, and site access issues prevented the full suite of measurements from starting until about Day of Year (DOY) 60. After this time, the systems operated nearly continuously until the end of the year. The data were post-processed and quality controlled with a suite of custom programs as detailed in Billesbach and Arkebauer22 (“Data processing and quality control (QC)”).

The high-speed (EC) instruments were mounted on one tower, and consisted of a sonic anemometer/thermometer (R3, Gill Instruments, Lymington, UK), an open-path CO2/H2O infrared gas analyzer (IRGA) (LI-7500, LiCor Biosciences, Lincoln, NE), and a small, single board computer (SBC) for data logging. The relevant low-speed (EBBR) instruments (located on an adjacent tower) included a pair of air temperature/relative humidity (T/RH) sensors (Humitter 50Y, Vaisala Oyj, Vantaa, Finland) which were housed in non-aspirated, 6-plate passive radiation shields (Radiation and Energy Balance Systems, Inc., Seattle WA). In addition, there was a net radiometer (NR-Lite, Kipp & Zonen inc., Delft, The Netherlands), a pair of soil heat flux plates (HFT, Radiation and Energy Balance Systems, Seattle, WA), and a barometer (PT101B, Vaisala Oyj, Vantaa, Finland). Data from these instruments were recorded on a small, stand-alone data logger (CR23X Campbell Scientific Inc., Logan, UT). The eddy covariance instruments were controlled and data were logged by a custom program (HuskerFlux), designed and programmed at the University of Nebraska, and running under the Windows XP operating system. In addition, the slow response data logger was periodically downloaded to the same SBC. The data were retrieved at monthly or bi-monthly intervals, during regular site maintenance visits. The data were post-processed and quality controlled with a suite of custom programs (see “Data processing and quality control (QC)” for details). The eddy covariance flux systems and data workflows were identical to the ones used at our other AmeriFlux sites and have been validated against the AmeriFlux gold data files and by AmeriFlux intercalibration site visits22,55.

The LI-7500 IRGA was calibrated annually using dry N2 gas, a known mixing ratio of CO2 in air, and a LiCor LI-610 dew-point generator. At the same time, the T/RH sensors (Humitter 50Y) were swapped out with a freshly calibrated and cleaned pair (see "Energy balance/Bowen ratio (EBBR) calibration" for details).

The high-speed instruments (IRGA and sonic) were located 3.9 m above the ground, the two T/RH sensors were located at 2.5 m and 3.9 m, the net radiometer was at 2.2 m, and the soil heat flux plates were 50 mm below the soil surface. It was found that the relatively shallow installation depth for the soil heat flux plates provided optimum results in a homogeneous grassland environment. Attempts to correct the soil heat flux for energy storage above the installation depth showed no consistent improvement and only added substantial uncertainty to the available energy. The two towers were approximately 5 m apart on an east–west line.

Data processing and quality control (QC)

Eddy covariance (EC) data processing

The fast response data consisted of continuous 10 Hz measurements from the sonic anemometer and the IRGA that were recorded on the SBC. These were processed into half-hourly fluxes. First, the data were checked for gaps (more than 2% of an averaging interval) and spikes (values larger than 5 standard deviations from the mean). If a gap was found, the interval was not processed. Spikes were replaced by the mean value of the interval, but if more than 100 spikes were present, the interval was not processed. The wind data were then transformed by a 2-axis rotation to slipstream coordinates. This effectively removed the mean values of the cross and vertical wind components. The arithmetic means of the scalar quantities (over the half-hour flux periods) were then calculated and removed from the dataset (block averaging). During this step, the relevant covariances and other statistical moments were calculated after appropriate corrections for inter-instrument time lags (maximum covariance). After this was done, the variance of the covariances was calculated56,57 as an estimate of the covariance (and flux) uncertainties.

Because the sonic anemometer measures virtual temperature, the covariance between sonic temperature and vertical wind speed must be corrected for the simultaneous transfer of water vapor46. This correction uses the covariance between vertical wind speed and water vapor density or alternatively, the latent heat flux. Finally, the Webb, Pearman, and Leuning terms45 were added to the raw LE and frequency corrections47 were applied to all terms to obtain fully corrected values for H and LE.

Energy balance/Bowen ratio (EBBR) data processing

Data from the slow response instruments were processed off-line to yield the EBBR fluxes. The raw data set consisted of 30 min instrument averages, generated by a CR23X data logger. Air temperature, relative humidity, atmospheric pressure, net radiation, and soil heat flux values were compared to maximum and minimum acceptable values and were flagged as either good or bad. Absolute values of water vapor density were then calculated from relative humidity, atmospheric pressure, and air temperature, then substituted into Eq. (4) to obtain the turbulent value of the Bowen ratio.

Because the Bowen ratio derives from the same basic relationships and measurements as gradient fluxes34, the corresponding Webb, Pearman, and Leuning terms45 also apply. These manifest themselves as small, additional buoyancy terms that must be added to the “turbulent” component to yield the total Bowen ratio. H and LE were then obtained as the simultaneous solution of the energy balance equation (Eq. 1), and the definition of the Bowen ratio (Eq. 5).

Modified Bowen ratio (MBR) data processing

In this method, the Bowen ratio was calculated in a manner identical to the EBBR method described above. H, however was calculated from the sonic anemometer fast response data (Eq. 2) and frequency corrected as in the EC procedure. Processing deviated from the EC scheme at this point. Because the covariances between vertical wind speed and water vapor density were not available to calculate the Schotanus et al.46 correction, it was re-cast in terms of the Bowen ratio13 and mean values from the slow response sensors were used for the calculation. This fully corrected H is then substituted into the Bowen ratio relationship (Eq. 5) to calculate LE.

Residual energy (RES) data processing

The residual energy (RES) method is similar to the MBR method in that the fully corrected H (as in the MBR method from Eq. 1) was calculated. The available energy (right-hand side of Eq. 1) was measured and then used to solve for LE as a residual.

Quality control (QC)

After calculating the energy budget terms (H and LE) and applying all corrections for all four methods, the data were put through a quality assurance/quality control (QA/QC) procedure that removed questionable data points before comparisons were made. The QA/QC conditions were comprised of several common restrictions and a number of conditions that were unique to each method. The common conditions, applied to all methods, were (1) LE must be less than 1000 W m−2 and greater than −200 W m−2, 2) the mean wind speed for the measurement interval must be greater than 2 m s−1, and (3) each sensor involved in the particular calculation must be present and operating correctly. The conditions unique to each method are listed in Table 2. Only data that met the common and unique conditions for both methods were retained for the comparison, while those that did not were excluded. Due to power and instrument failures throughout the year, the starting data set (before QA/QC) consisted of 6781 individual 30-min values. Table 3 lists how many (and what fraction of the total) points passed the QA/QC screening for each comparison.

Table 2 Unique QA/QC conditions for the flux estimation methods.
Table 3 Number and fraction of points available for comparison after QA/QC for the flux estimation methods (6781 points were initially available before the QA/QC process).

The comparisons consisted of linear regressions over selected ranges of the data sets. The calculations resulted in offsets (b), slopes (m) (with standard errors), and coefficient of determination (R2) values as well as standard errors of regression for the annual fits.

Energy balance/Bowen ratio (EBBR) calibration

In the above discussion, it was mentioned that the EBBR system used in this study did not include an exchange mechanism or aspirators for the T/RH sensors. This decision was driven by cost and available power constraints. Part of the motivation to collocate the EC instrumentation at this site was to validate the use of this particular EBBR configuration for long-term budgets. The data shown in Figs. 1b and 2a suggest that this EBBR implementation was indeed successful. To achieve this level of data compatibility required extra attention and maintenance steps described here.

Figure 1
figure 1

Energy balance and sensible heat flux comparisons. (a) EC energy balance. (b) EBBR vs EC sensible heat flux. (c) RES (and MBR) vs EC sensible heat flux.

Figure 2
figure 2

Latent heat flux comparisons. (a) EBBR vs EC latent heat flux. (b) RES vs EC latent heat flux. (c) MBR vs EC latent heat flux.

Traditional EBBR instrument systems use an exchanging mechanism to remove biases between sensors (or sampling tubes) at the two levels. By swapping the sensors’ (or sampling tubes’) heights several times during each averaging period, it is assumed that the biases will average out. Aspirator fans are often used to circulate air around the sensors to promote uniformity with the free atmosphere during periods of low wind speed.

The Humitter 50Y (Vaisala Oyj, Vantaa, Finland) used in this system is a small, inexpensive T/RH sensor that fits well in 6-plate, unaspirated radiation shields. It uses a platinum resistance thermometer (PRT) as its basic temperature sensor and a solid state, capacitive RH sensor (HumiCap) for moisture. PRTs are one of the most intrinsically stable and precise temperature sensors available and, once the first order offsets (usually due to the driving and processing electronics) have been characterized, it is only necessary to make infrequent and minor adjustments to the raw data. The HumiCap sensors used in the Humitter 50Y units are identical to the ones used in the more precise (and more expensive) HMP155 units. The primary difference between these instruments is in the associated electronics and the final manufacturer’s calibration of the sensors. The HumiCap sensors themselves are relatively stable.

To minimize the inter-sensor biases, all of the Humitter 50Y sensors (usually in groups of six for three sites) were calibrated against an identical laboratory unit, in a closed container. The temperature of this calibration chamber was varied using external ice packs and an internal heater. Humidity inside the closed chamber was controlled by circulating the air through a dew point generator (LI-610, LiCor Biosciences, Lincoln, NE). All T/RH sensors were strapped together to facilitate uniformity and a fan was used to mix and circulate the air inside the sealed calibration chamber. Air temperatures and relative humidities from all sensors were recorded every second with a CR23X data logger. Before installation in the calibration chamber, the Teflon filters of each Humitter 50Y were cleaned to eliminate any potentially hygroscopic dust from the system. After recording data over a range of air temperatures and humidities, linear regressions (for T and RH) were run against the reference unit. The regression statistics mostly found constant offsets while the regression slopes were usually near unity. These regression coefficients were subsequently used to adjust air temperatures and relative humidities in the EBBR data processing procedure. Calibrations were performed annually on the field sensors, and the reference Humitter 50Y was itself annually calibrated against a precision PRT (PR-10, Omega Engineering, Stamford, CT), read with a Keithley model 197 5 ½ digit DMM (Keithley Instruments Inc., Cleveland, OH) in 4-wire resistance mode, and with a precision dew point hygrometer (RHB-3, Omega Engineering, Stamford, CT). This procedure proved adequate to remove most of the sensor bias from the EBBR system.

Fortunately, this research site normally experienced moderate to high wind speeds both day and night throughout the year, eliminating the need to aspirate the radiation shields. While this calibration procedure should work for most sensors, the wind conditions at individual sites will determine if aspiration is needed or not.

Results

In this analysis and discussion only 30-min, measured data values were used. This avoided complications that could have been introduced by gap-filling procedures needed to generate daily or longer averages58,59. As noted in "Modified Bowen ratio (MBR) data processing", the method of obtaining fully corrected values of H is slightly different for the MBR and RES methods than for the EC method. Because of this, these were considered distinct from EC. The result is that there are three independent methods of measuring H and four methods of measuring LE.

Two approaches were used to analyze the energy flux data. In the first, the entire data set was considered. This provided insight into the performance of the various methods for constructing annual budgets of energy fluxes, as well as how the methods might perform at short time intervals. In the second approach, the data were binned according to five external factors: day of year (or season), friction velocity (u*), Bowen ratio (β), relative humidity (RH), and wind direction. These factors were chosen because they were either expected to directly impact sensor or method performance, or because they were typical factors used to assess fluxes from research sites.

Bulk fluxes

Energy balance

Energy closure has long been considered an important quality indicator for EC flux systems28,30,60,61. Over the year, the EC system fluxes underestimated available energy (approximated by Rn – G) by about 6% (Fig. 1a), or conversely, the energy available for H and LE was overestimated by 6% during this period, as the difference could be accounted for by unmeasured, dissipative terms (plant canopy, soil heat storage, etc.) in the available energy calculation. Analysis indicates that 67% of the individual available energy measurements are within 19% of the corresponding H + LE total (Fig. 3a), and the standard error of regression indicates that the average deviation of the H + LE sum from the available energy is about 40 W m-2 (Table 4). Together, these give confidence in the accuracy and representativeness of the EC values for H and LE over long and short terms. It also implies that any unmeasured components of the available energy were relatively small. Based on these results, and because the EC method is widely considered the most direct measurement of atmospheric fluxes36,37, the rest of the analysis will focus on comparisons of the other methods with EC.

Figure 3
figure 3

Deviations of energy balance and sensible heat flux values from the best fit line. The vertical axis is the fraction of the total number points in the sample and the horizontal axis is the percentage deviation from the fit line. The horizontal line represents 67% of the data points. (a) EC energy balance. (b) EBBR vs EC sensible heat flux. (c) RES (and MBR) vs EC sensible heat flux.

Table 4 Regression results for flux comparisons in all conditions.

Long-term sensible and latent heat fluxes

Comparisons between the three independent methods of determining H showed good long-term agreement between all methods (Fig. 1b, c). The slopes are all within 3% of the 1:1 line, and the zero offsets are very small.

LE values from the EBBR and RES methods are almost identical to those determined by the EC technique (Fig. 2a, b). On an annual basis, both EBBR and RES yield LE fluxes that are about 2% larger than the corresponding EC flux. In contrast, the MBR method (Fig. 2c) gives fluxes that are about 8% smaller than EC.

Short-term sensible and latent heat fluxes

For short-term measurement comparisons, the absolute deviation of each (30-min) value from the annual regression line was calculated, and the cumulative frequency distributions of these deviations were plotted. This analysis indicates, on average, how far a given fraction of the measured, half-hour values will fall from the regression line (H and energy balance are shown in Fig. 3 and LE in Fig. 4), and provides a somewhat different perspective on data scatter than the standard error of regression. This measure of data scatter provides an indication of how well the given method performs (relative to the EC method) for short-term, process-level studies. In Figs. 3 and 4, the horizontal lines indicate where 67% of the total sample falls. Using the EBBR vs EC H comparison as an example, Fig. 3b indicates that 67% of the EBBR values fall within 46% of the regression line. In other words, two thirds of the time, the EBBR values will be within 46% of the ‘true’ values (as defined by the EC method). Results for all of the comparisons are listed in Table 3.

Figure 4
figure 4

Deviations of latent heat flux values from the best fit line. The vertical axis is the fraction of the total number points in the sample and the horizontal axis is the percentage deviation from the fit line. The horizontal line represents 67% of the data points. (a) EBBR vs EC latent heat flux. (b) RES vs EC latent heat flux. (c) MBR vs EC latent heat flux.

In addition to the above analysis, the standard error of regression (Table 4) indicates the average departure of the half-hourly, measured flux values from the regression line. Except for the cases of RES H and MBR LE, these are all similar and predict that on average, energy flux values will lie between 40 and 50 W m−2 from the predicted EC value. This deviation is significantly smaller for the RES H comparison (27 W m−2), and is significantly larger for MBR LE (80 W m−2).

Binned fluxes

To help understand how external factors may affect the different measurement methods, the data were also segregated into bins (representing ranges of these factors) and further linear regressions were calculated. The bins were determined using the best available data. The friction velocity (u*), Bowen ratio (β), and wind direction bins were derived from EC data and the relative humidity bins were derived from the upper relative humidity data stream. For brevity, full results of the fits (slopes and intercepts with corresponding standard errors) are listed in the Supplementary Information Tables S1–S5 and summarized here. Except for instances where the regression slopes deviated significantly from 1, the intercepts and standard errors of the slopes were small. Likewise, the standard errors of the intercepts were small compared to the range of the data, indicating little significant offset. Because of this, the majority of the analysis will focus on the slopes and R2 regression values.

Seasonal dependence

For the seasonal analysis, the year was divided into five periods of 60 days each, beginning on DOY 61 when the full instrument system became active, to give a sufficient sample size for statistical relevance.

The EC energy balance over the course of a year was mostly constant (Supplementary Table S1, column 2) except during the third period (DOY 181–240) where the slope was somewhat smaller than the rest of the year and the fifth period (DOY 301–365) where the slope was slightly larger, nearing unity and indicating closer closure on the energy budget.

H measured using the EBBR and EC methods were also compared over different seasons (Supplementary Table S1, column 3). In contrast to the seasonal energy balance relations, the regression slopes are best during the fourth period (DOY 241–DOY 300) while the other periods show deviations from 1:1 between + 15% and − 20%, and significantly higher scatter about the regression line (lower R2 value) during the second and third periods (DOY 121–DOY 240). Conversely, there is little seasonal change in the comparison of H measured by the RES and EC methods (Supplementary Table S1, column 4), except for higher scatter (lower R2 value) during the second period (DOY 121–DOY 180).

With the exception of the first and last periods (DOY 61–DOY 120 and DOY 301–DOY 365), the regression slopes for LE from the EBBR and EC methods were all similar and close to one (Supplementary Table S1, column 5). While there is seasonal variation in the R2 values, no clear pattern emerges. The comparison between RES and EC (Supplementary Table S1, column 6) yielded regression slopes very near unity for all periods except the last (DOY 301–DOY 365) and R2 values similar to those in the EBBR vs EC comparison. Finally, the comparison between the MBR and EC methods (Supplementary Table S1, column 7) resulted in regression slopes that were noticeably smaller in the first and third periods (DOY 61–DOY 120 and DOY 181–DOY 240) than those for the other methods, while for the fifth period (DOY 301–365) the regression slope was much larger. The scatter of points around the regression line was also significantly larger (smaller R2 values) for all periods than it was for all other methods.

Friction velocity (u*) dependence

Regression slopes for energy balance (Supplementary Table S2, column 2) increased towards unity with increasing friction velocity; however, the scatter of data around the regression line remained fairly constant, as shown by the relatively constant R2 values.

Similar to energy balance, the comparison of H between the EBBR and EC methods (Supplementary Table S2, Column 3) showed regression slopes decreasing towards unity until u* values reached 0.6 m s−1, after which HEBBR < HEC. Conversely, the scatter of values about the regression line increased (decreasing R2 values) with increasing friction velocity. Comparison of H between the RES and EC methods (Supplementary Table S2, column 4) showed little change in regression slope over the range of friction velocities while the scatter about the regression line increased (decreasing R2 values) over the range.

The comparison of LE values between the EBBR and EC methods (Supplementary Table S2, column 5) shows little clear change in regression slope or scatter (R2 values) over the range of friction velocities. The LE comparison between RES and EC methods (Supplementary Table S2, column 6) shows improvement in the regression slopes above u* = 0.4 m s−1. Conversely, the scatter (R2 value) is slightly improved at lower values of u*. Comparison between the MBR vs EC methods (Supplementary Table S2, column 7) shows substantially lower regression slopes for u* values less than 0.4 m s−1, and like the comparison of H between EBBR and EC, an increasing amount of scatter with increasing u*.

Bowen ratio dependence

It should be noted that while there are bins labeled “negative” and “0–1” for the Bowen ratio comparisons, the QC criteria applied to the individual methods (Table 2), produced gaps around zero and limited data for negative values.

Regression slopes for energy balance (Supplementary Table S3, column 2) were relatively constant as a function of Bowen ratio until values greater than 5, after which the slope deviated considerably from 1. Scatter about the regression line decreased with increasing Bowen ratios.

Regression slopes for H comparisons between the EBBR and EC methods (Supplementary Table S3, column 3) were close to 1 and relatively constant for Bowen ratios between 0 and 5. Outside these limits, the slopes were lower. The scatter of points about the regression line uniformly decreased (increasing R2) with increasing Bowen ratio. Regression slopes for the comparison of H values between the RES and EC methods (Supplementary Table S3, column 4) were near unity for all values of the Bowen ratio, however scatter was larger (smaller R2 values) for Bowen ratios less than 1.

Regression slopes for the LE comparison between the EBBR and EC methods (Supplementary Table S3, column 5) behaved like those for H, showing good agreement between Bowen ratios of 0 and 5, with much worse agreement outside those limits. The scatter of LE values (indicated by R2 values) for this comparison increased with increasing Bowen ratio until a value of 5. Comparison of LE between the RES and EC methods (Supplementary Table S3, column 6) showed regression slopes near unity between Bowen ratio values of 0 and 3 with negative slope and particularly low correlation above a value of 5. The scatter of points for this comparison decreased uniformly with increasing Bowen ratio. Regression slopes of LE for the MBR and EC methods (Supplementary Table S3, column 7) were near 0 for negative values of the Bowen ratio and > 2 for values over 5. Between Bowen ratios of 0 and 5 though, the slopes were closer to unity, particularly between values of 1 and 5. Scatter was particularly large for this comparison across the range of Bowen ratios.

Relative humidity dependence

The regression slopes for energy balance (Supplementary Table S4, column 2) were close (within ~ 10%) to unity across relative humidity values, and no significant differences in scatter were observed.

The regression slopes for H between the EBBR and EC methods (Supplementary Table S4, column 3) were within 10% of unity except for relative humidity values greater than 80%. The scatter of points about the regression line decreased (increasing R2 values) with increasing relative humidity > 20%. Regression slopes for the H comparison between the RES and EC methods (Supplementary Table S4, column 4) were all close to unity, and the scatter of points about the regression line roughly increased (decreasing R2) with increasing relative humidity.

Regression slopes for the LE comparison between the EBBR and EC methods (Supplementary Table S4, column 5) were all near unity, and the scatter about the regression line was smallest at low relative humidity and largest at higher relative humidity. The LE comparison between the RES and EC methods (Supplementary Table S4, column 6) also showed near unity regression slopes, and the scatter of points around the regression line was relatively constant except for relative humidities greater than 80%. Regression slopes for the LE comparison between the MBR and EC methods (Supplementary Table S4, column 7) were within 10% of unity except for relative humidities between 20 and 40%. The scatter about the regression line was fairly high and showed no obvious pattern.

Wind direction dependence

The regression slopes for energy balance (Supplementary Table S5, column 2) indicate that winds from the south and west resulted in somewhat better energy balance closure than winds from the north and east. The degree of scatter about the regression line showed no discernable dependence on wind direction.

Regression slopes for the H comparison between the EBBR and EC methods (Supplementary Table S5, column 3) indicate that better agreement between the methods occurred for east and west winds. Similarly, the amount of scatter about the regression line was greater for north and south winds. Regression slopes for the comparison between the RES and EC methods (Supplementary Table S5, column 4) showed no significant dependence on wind direction, however there did appear to be more scatter about the regression line for south winds.

Regression slopes for LE between the EBBR and RES, and the EC method (Supplementary Table S5, column 5–6) showed no significant dependence on wind direction and neither did the corresponding scatter about the regression lines. Finally, the regression slopes for LE between the MBR and EC methods (Supplementary Table S5, column 7) were distinctly improved for winds from the south and from the west, and scatter about the regression line was reduced for west winds.

Discussion

To interpret these results in the context of our objective (providing a basis for comparison of existing data and guidance in experimental design over multiple time frames), a sense of what constitutes “good” and “bad” comparisons must be established. Because offsets (b) (and their standard errors) were mostly small compared to the range of data in the linear regressions, two parameters are the foci of the comparisons: regression slope (m) and coefficient of determination (R2).

In defining good and bad comparisons for slope values, it is useful to consider differences between calculated slopes and unity with typical, expected uncertainties in fluxes. For quantities derived from the EC method, uncertainties may be estimated by several methods57, and following Cook and Sullivan42 we use ± 10% of the flux value (errors propagating from, e.g., calibration, biases, and sensor alignment). Uncertainties for fluxes derived from gradients and slow-response instruments may also be calculated by propagating individual measurement uncertainties through the corresponding flux relationships. Again, ± 10% of the flux value is used43. It is thus reasonable to conclude that if regression slopes deviate by less than 10% from unity, the measurement uncertainty introduced by a particular method will not dominate, and the methods can be considered comparable. The choice of criteria for R2 is not as clear cut. Whether or not a technique is useful depends on the time scale studied and the amount of averaging used. Averaging 30-min fluxes to longer time scales is expected to reduce the scatter and increase the value of R2. Nevertheless, a value of 0.9 will be used to define an arbitrary threshold between what is considered “good” and “not good” for R2 values.

The results can be interpreted within two broad contexts. First assuming that the measurements will be used to produce long-term or annual budgets of energy and/or water and second, that the measurements will be used to parameterize, support, or inform shorter-term (daily to seasonal) process studies. One indicator is the standard error of the regression (Table 4). We see that these values are mostly between 40 and 50 W m−2. For mid-day energy fluxes in temperate climates, this average deviation amounts to between 5 and 15% of the mean values (as typically determined by EC). This range is comparable to the measurement uncertainty expected for the individual methods. On the other hand, this difference can be relatively larger during nights and in low-energy environments (Arctic or Boreal ecosystems) when fluxes are of smaller magnitude. The exceptions to this are H measured by the RES method and LE measured by the MBR method. The RES H measurements are (except for correction terms), essentially identical to EC, so the improved agreement is to be expected. LE measurements from the MBR method have the potential to accumulate instrumental uncertainties and biases that are common to both EC and EBBR, and might therefore be expected to be somewhat worse.

The external factors that were examined (seasonality, friction velocity, Bowen ratio, relative humidity, and wind direction) also bear some consideration. As previously noted, some of these factors describe mostly site-dependent effects (wind direction and seasonality) while others are more directly connected with instrument and method performance (Bowen ratio, relative humidity, friction velocity). It should also be noted that these factors are not independent of each other. For example, wind direction, relative humidity, and friction velocity will usually depend on season.

Over an annual basis, the data suggest that, based on the near unity regression slopes for H and LE, and the relatively small intercepts (Figs. 1 and 2), all methods will (under the constraints of this site) perform similarly. The possible exception might be the MBR method for LE which has a somewhat reduced regression slope than the others. It is likely that any long-term influence attributable to the five external factors examined were averaged out over the course of the experiment: Over periods of 60 days (Supplementary Table S1), some variation in regression slopes were noted which could be due to changing fetch directions and seasonal effects on moisture and canopy phenology.

For shorter-term process studies, the EC method is, by default, the preferred measurement method. If the time scale of the study extends to daily or weekly lengths, it is expected that all but the MBR method will yield similar results as the R2 values would be expected to increase after time averaging. A possible source of the data scatter may be the precision and accuracy with which gradients of air temperature and water vapor pressure can be obtained, and the accuracy and spatial mismatch of net radiation vs flux measurements. While efforts were made to minimize errors and uncertainties in gradient quantities used in this study (see "Energy balance/Bowen ratio (EBBR) calibration"), the relative size of the uncertainties will grow as the gradients decrease and as the ambient conditions approach the operational limits of the instruments. In methods that rely on measurements of net radiation (EBBR and RES), the quality of the net radiometer measurements will also be critical. The instrument used for this study was only of medium quality, and it was expected to have a precision and accuracy < 10%. Often, the dissipative energy terms (soil heat flux and others) are small compared to net radiation and will not be the dominant contributors to overall uncertainty. During certain times of the year, however, these terms can grow large enough to be significant in the measurement uncertainty, and should be considered and the appropriate steps taken to minimize them.

While seasonality (or time) cannot itself be a controlling factor in these comparisons it can be an important surrogate for other real factors (or combinations of factors) such as temperature, net radiation, wind speed or direction, phenology, etc. The seasonal differences in regression slopes and R2 values that appeared in the analysis (Supplementary Table S1) could be attributable to site specific factors that were revealed under different conditions; e.g., varying seasonal wind directions caused the instrument systems to sample different areas of the region with varying surface conditions (water content, plant canopy composition and height, etc.) and varying wind speeds (and thus friction velocity) with season. Similar observations can be made regarding the results presented in Supplementary Table S5 for wind direction.

The comparison of fluxes as a function of friction velocity (u*) shown in Supplementary Table S2, provides a more informative picture of the differences between methods. All of the methods examined rely on turbulent mixing of the atmosphere to transport energy and matter; mixing can be roughly quantified by friction velocity. The relative differences between regression slopes over the suite of methods is small for a large range of u* values. The largest regression slope differences from unity are observed at small values of u* where mixing is minimal and the assumptions behind the methods are no longer valid. This behavior is expected and helps demonstrate the validity of the individual methods.

As an external factor, the Bowen ratio (Supplementary Table S3) comparison combines both instrumental and site characteristics. Different ecosystems will experience different annual distributions of Bowen ratios. For example, sites similar to this one (semi-arid) expect more frequent high values of the Bowen ratio while tropical, wet sites expect more frequent low values. Both of these extremes are characterized by one of the two gradients in Eq. (4) being small. Small gradient values will usually have large relative uncertainties, leading to higher data scatter and potential biases in comparison of regression slopes. This behavior is reflected in the data of Supplementary Table S3. Steps can be taken to minimize these problems such as choosing instrumentation specifically suited to dry or wet ecosystems, and by choosing study sites that allow large vertical sensor separations (large homogeneous fetch, etc.). It is also worth noting that sites where radiational cooling is important (negative Bowen ratios) will be difficult to characterize with methods utilizing the Bowen ratio (EBBR and MBR).

Conclusions

This study examined three independent methods of measuring sensible heat flux and four methods of measuring latent heat flux via atmospheric observations. The methods were evaluated for performance in long-term and short-term uses by comparison against the eddy covariance technique as reference.

It was found that all methods had good performance (regression slopes between 0.9 and 1.1) for both H and LE measurements over the course of a year when compared to the eddy covariance method. For short-term observations (30-min averages), H measured with RES method compared best with eddy covariance (27 W m−2 average deviation) while the EBBR method deviated by about 41 W m−2. This was not unexpected since there are only minor differences between EC and RES for this use. This suggests that EBBR may be better suited to long-term (daily, weekly, or monthly) observations. For short-term, LE from EBBR and RES showed similar behavior to H from EBBR, while LE from MBR had considerably more scatter when compared to EC.

Besides these objective observations, other, subjective criteria should be considered when choosing a measurement technique for any given experiment. These can include factors such as instrumentation cost, maintenance effort required, power usage, or complexity of data processing (Supplementary Table S6).

In addition to informing decisions related to experimental measurement method selection, these results may also be useful in analyzing data sets for synthesis studies. This is especially true when the data sets combine sites using different measurement methods. While there have been many works that detail and report results from each of the measurements methods presented here, there are few that have made measurements with all four methods at a single site for an extended period as done herein. This provides non-anecdotal data for decision making when planning and budgeting for new field studies, and gives a basis that large, synthesis studies can use when comparing data sets derived from these different measurement methods. These results can give insight to the significance of ecosystem state differences as determined by different measurement methods. While the results presented here are valid at the semi-arid grassland site where the data were collected, they are not to be construed as universally applicable. Experimenters are encouraged to carefully consider site differences when using these results to inform instrument choice. Those conducting multi-site analyses that consist of several measurement methods are likewise advised to use care in applying these results, especially in estimating measurement uncertainties.