Introduction

Above ground forest carbon storage assessments are important for strategic forest management (Graham et al. 2017). Sustainable management of forest resources requires the assessment of biomass and its changes over time (Canga et al. 2013). Estimation of above ground biomass (AGB), particularly SW biomass is important for commercial purposes to assess the availability of e. g. fibre, lumber, firewood (Parresol 1999). Biomass estimates allow for comparison between different sites, species and ecosystems and are essential to determine the productivity of forest ecosystems (Parresol 1999). The assessment of AGB is essential for ecosystem modelling on a local to global scale. AGB increment of forests is one of the most important sources for decreasing CO2 in the atmosphere (Sandoval et al. 2021). Assessing and understanding the spatial variability in AGB allows for improved silvicultural management of forest stands and also plays a role in the mitigation of climate change (Bayen et al. 2020; Espinosa et al. 2005). Over the last three decades, pines have been grown over shorter rotation lengths in the Southern Hemisphere, mainly due to genetic breeding and silvicultural improvements, economic pressures and attempts to minimise risk (Wessels et al. 2015; Erasmus and Wessels 2020). It is questionable whether existing allometric equations developed in older, heavily thinned stands of longer rotation (e.g. van Laar and van Lille 1978; van Laar 1982) are also accurate for shorter rotation crops. There is also a need to understand biomass allocation and C sequestration dynamics following thinning.

Biomass can be assessed by remote sensing (Li et al. 2020), direct measurements (Parresol 1999) and by applying data obtained from volume tables (Picard et al. 2012). Remote sensing and volume table estimates requires ground truthing (Gerber 2000). Biomass can be measured directly through non-destructive methods such as terrestrial laser scanning (Menéndez-Miguélez et al. 2023) or through the application of biomass expansion factors (Magalhães and Seifert 2015). Trees can also be measured destructively through the tree felling (Sandoval et al. 2021). Furthermore, tree biomass can be measured using semi-destructive methods where a combination of branch harvesting (for branch and foliage biomass modelling), standing stem volume measurement and wood core sampling are used (Mensah et al. 2016). Various destructive sampling methods are used in the development of biomass models, including bulk harvesting of more than one tree and individual tree harvesting. The most accurate approach is individual tree harvesting (Seifert and Seifert 2014). Individual trees for biomass assessment are selected based on the species, stand conditions and management regime (Beets and Pollock 1987; Chave et al. 2014; Luo et al. 2020). The selected trees are then cut and separated in their biomass components (e.g. stem, branches, leaves and cones). The dry weight of each component is determined separately by weighing (Parresol 1999). Through regression, the weight of the tree components is related to one or more dimensions of the tree (Seifert and Seifert 2014).

Biomass sampling inventories are often conducted through a two-step upscaling procedure (Seifert and Seifert 2014). The dimensions of the stand are determined during the first stage and a regression-based sampling approach is then used to upscale from the individual component to the tree and the stand level. Variables such as diameter at breast height (DBH) commonly collected during forest inventories and height (H) are often used for upscaling biomass to the stand level. In a study by Canga et al. (2013) on allometric equations developed for Radiata pine over 5 sites in the Asturias, Spain, the best dependent variables to relate the total SW and bark mass was DBH2H. A study by van Laar and van Lille (1978) on a single site in the Jonkershoek valley in South Africa found that the best dependent variables for determining the SW and total mass was DBH. The inclusion of height in addition to DBH can account for the variation in AGB amongst trees with the same DBH and offset the site effect (Mensah et al. 2016; Chave et al. 2005). Crown dimensions in addition to H are often included in biomass equations (Dimobe et al. 2018; Jucker et al. 2017) to provide more accurate biomass assessment, especially amongst the largest trees (Goodman et al. 2014; Ploton et al. 2016).

In regression analysis it is desirable that the estimates for the various biomass components should add up to the estimates for the total tree (Parresol 2001). Biomass equations are often fitted separately for each component with inconsistent results and then resulting in the sum of biomass components not adding up to that predicted by the total biomass equation. When equations are fitted separately, collinearity between the variables of the same sample trees are often ignored (Zhao et al. 2019). Various authors proposed fitting a system of aggregative biomass equations taking collinearity into account (Parresol 1999). Non-linear joint fitting approaches are used, including seemingly unrelated regression (SUR) and maximum likelihood estimation. An alternative approach to ensure additivity of biomass models is Dirichlet Regression (DR). The DR modelling approach is useful to model proportion data where individual components need to add up to the total (Poudel and Temesgen 2016). The DR approach permits to directly model the biomass proportions of each component, and then disaggregate the total biomass estimated by the total biomass equation into the biomass of each component, thereby ensuring additivity amongst tree components and total biomass (Zhao et al. 2019).

Radiata pine is the most widely planted conifer species in the world (Mead 2013) and is commonly planted in a number of Southern Hemisphere and other temperate climate countries including New Zealand, Chile, Spain, Australia and South Africa (Lavery and Mead 1998). These countries account for 90% of the Radiata pine planted globally (Mead 2013). In South Africa, Radiata pine is almost exclusively planted in the Western Cape and covers an area of more than 38 000 ha (Forestry Economics Services CC 2018/2019). Longer rotation Radiata pine stands destined for the sawtimber market commonly receive thinnings up to 15 or 20 years of age with clear felling at 25–30 years (Kotze and du Toit 2012; Zwolinski & Hinze 2000; van Zyl 2015 (Unpublished). Short-rotation Radiata pine stands in South Africa are often grown with a single mid-rotation thinning at age 11–13. Short rotation Radiata pine crops are established (depending on region and country) for the pulp and the pole market and rotation lengths vary from 15 to 20 years for pulpwood with no recommended thinning and rotation length from 20 years upwards for poles (Wessels 1987). Several biomass studies have been conducted since the 1970’s to determine the AGB of this species. Biomass studies conducted on this species in South Africa include the work of van Laar and van Lille (1978); van Laar (1982) and van Zyl (2015).

The published studies by van Laar and van Lille (1978) in the Jonkershoek valley focused on trees 29 years of age, while the follow up study by van Laar (1982), also in the Jonkershoek valley focused on trees varying in age from 39 to 41 years. The biomass study by van Zyl (2015) focused on allometric equations developed across 4 sites along the Garden Route, Western Cape and the sites range in age from 25 to 28 years (Fig. 2). However, no studies on individual tree biomass for short rotation (< 20 years) crops have been developed yet.

Biomass allocation to above and belowground components are dependent on site quality, silvicultural treatment, crop genetics, soil fertility and soil depth (Gonzalez-Benecke et al. 2021; Fernandez et al. 2017; Turner and Lambert 1986; Wang et al. 2017; Albaugh et al. 1998) and caution needs to be exercised when applying developed allometric equations to sites with different site, climatic conditions, silvicultural treatments and size/age classes. A study by Alvarez et al. (2012) found that the water availability of sites during the growing season affects the productivity and biomass allocation of Radiata pine crops, even when best silvicultural practices like soil preparation, genetic improvement, weed control and fertilisation are implemented. Radiata pine crops allocate more biomass to SW components in high productivity sites whereas increased fractions of biomass is allocated to needles and twigs in less productive sites (Balboa-Murias et al. 2006). Partitioning to needle mass decreases with stand age while SW biomass increases with age (Beets and Pollock 1987).

The objective of this study was to develop a range of logarithmically transformed linear models to estimate total tree biomass and biomass components for Radiata pine growing on short rotation. The models have been developed from Radiata pine trees harvested across four sites in the Boland region, Western Cape province. The best selected models were compared with other published models and tested on two independent sites in the Garden Route, Western Cape. Estimated SW and total mass of the study conducted in the Jonkershoek valley were also compared to the estimates from the current study. The best models from the current study were used to estimate the component biomass of sites of similar silvicultural treatment, site quality and size classes in the Western Cape.

It is hypothesised that:

HI

The best models to estimate the SW, bark and total mass will include the dependent variable DBH2H as it includes both diameter and height variables, improving the overall sensitivity of the model.

HII

The DR approach will yield statistically similar estimates for all component proportions when compared to the best models developed from diameter and height in the current study.

HIII

Allometric relationships developed for sites with high resource availabilities (nutrients and water) will yield inaccurate predictions when extrapolated to sites with low levels of resource availability.

This study set out to develop allometric equations that can accurately model biomass for shorter rotation Radiata pine stands and/or mid rotation thinnings from stands grown on longer sawtimber regimes and then to evaluate them using independent data.

Materials and methods

To develop allometric equations for biomass a number of steps are involved during the data collection, processing and upscaling of the samples. Figure 1 set out the process that was followed for this study from the sample plot selection to the development of tree level biomass equations.

Fig. 1
figure 1

Procedure followed to develop and apply the allometric biomass equations

Data collection

A total of 20 Radiata pine trees were felled for biomass across 4 sites in the Boland region (see Table 1 for dimension of the variables measured). Trees to be sampled were selected randomly within 5 cm diameter classes. Five trees each were harvested from the Kluitjieskraal, Lebanon, Grabouw and Jonkershoek plantations (Fig. 2). Sample trees were selected to represent the diameter at breast height (DBH) and total height (H) distribution of the 4 sites and also of the environmental and climatic conditions of Radiata pine stands in the Boland region. The SI20 range from 18 to 20 m and the MAP range from ± 782 to ± 1138 mm, while the age classes range from 13 to 20 years for the four sites (see Online Resource 1 for climatic and environmental conditions of the 4 sites where the sample trees were harvested).

Table 1 Dimensions and size class distribution of the various biomass components
Fig. 2
figure 2

Location of the sample trees collected in (a) and the independent site in (b)

Branch whorls were identified and numbered along the length of the stem. Height of each branch whorl from ground base level was recorded and all branches from each whorl were measured for diameter 10 cm from the branch insertion point. All branches in each whorl were numbered and a random number was allocated to each whorl for branch sampling. An average of 12 sample branches per tree were collected to represent the full branch diameter range. Sample branches were kept aside for biomass determination.

The upscaling of the SW component usually consists of a measuring component where the basic density is determined at a disc level and a regression-based approach where the mass of the rest of the SW is estimated (Seifert and Seifert 2014). Felled trees were sampled and measured for stem disks (∼ 5 cm thickness) and under bark diameter at heights 0, 1.3, 3 m and every 3 m interval thereafter. The geometric formula for a conical frustum (Seifert and Seifert 2014, Eq. 1) was used to calculate the SW volume between measuring points:

$$V= \frac{\pi h}{3}({R}^{2}+Rr+{r}^{2})$$
(1)

where V is the volume between sections and R and r is the large and small end radius of the measured stem sections.

The collected disks were separated in SW and bark components. The green volume of the SW and the bark components were determined by water displacement. Bark and SW components were subsequently dried to determine the dry mass (Seifert and Seifert 2014).

The branch biomass was separated into needle, branch and cone material. Component biomass was determined by drying the components separately. The dry mass of all the biomass components was determined by drying samples at 105 °C until constant mass was attained (Picard et al. 2012).

The SW biomass was computed by the following equation:

$$BM=V.BD$$
(2)

where BM is the dry SW biomass (kg D.M.), V is the SW volume (m3) and BD is the basic density of the SW disks as a function of the dry weight divided by the green volume (kg/m3).

The bark biomass of the stem sections was determined by calculating the percentage of dry bark volume to the volume of the wooden disks and multiplying the calculated percentage dry bark mass by the calculated SW volume of each respective stem section.

Determination of above ground biomass

A two-step procedure was used to upscale from the sample to the branch and from the branch to tree level (Seifert and Seifert 2014). Each step involved sampling and regression: first the branch-level biomass models were developed on a subset of the data of the larger population, where the weight of the components was regressed against easily measurable variables such as branch diameter and whorl height. During the second upscaling step, the total biomass of each component estimated during the first upscaling step was regressed against dimensions at the tree level, such as DBH or H.

Allometric equations in the form of a power function \(y=a{x}^{b}\) where a and b are the scaling parameters were used for the upscaling. The power function was linearised by applying logarithmic transformations to both sides of the equation. Common assumption of linear regression includes normality, homoscedasticity and the absence of multicollinearity. From these assumptions, the normality assumption is often assumed to be the least important (Gelman and Hill 2007; Lumley et al. 2002). To determine the relationship between the residuals and the independent variables, diagnostic plots of the distribution of model residuals were inspected to determine whether there were any deviations from the assumptions of linear regression (Picard et al. 2012; Knief and Forstmeier 2021).

Logarithmic transformed simple (Eq. 3) and multiple linear regression equations (Eq. 4) were used to model the forest biomass.

$$\ln \, (Y)\, = \,\beta 0\, + \,\beta 1\ln \left( {\rm X} \right) \, + \varepsilon$$
(3)
$$\ln {\kern 1pt} (Y){\kern 1pt} = {\kern 1pt} \beta 0{\kern 1pt} + {\kern 1pt} \beta 1\ln (X1){\kern 1pt} + {\kern 1pt} \cdots {\kern 1pt} + {\kern 1pt} \beta j\ln (Xj){\kern 1pt} + \varepsilon$$
(4)

where Y is the dry mass of the various biomass components (SW, bark, needle, branch, cone and total); ꞵ0, ꞵ1j is the unknown but estimable regression coefficients, X are the respective independent variables of the model and \(\upvarepsilon\) is the error term.

When logarithmic transformed models are transformed back to the arithmetic scale, there is usually an error caused by the log-normal distribution. To correct for the error, the following correction factor by Baskerville (1972) are multiplied with the estimated values:

$$CF = e\,\left( \frac{RSE^2}{2} \right)$$
(5)

where the RSE is the residual standard error of the regression.

Upscaling I: from the sample to the branch level

Equations 6 and 7 were used to fit the allometric relationship between the independent and dependent variables. The following models were used and compared to determine the branch and needle mass:

$$\text{lnBNM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnD}+\upvarepsilon$$
(6)
$$\text{lnBNM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnD}+{\upbeta }_{2}\text{lnWH}+\upvarepsilon$$
(7)

where BNM is the branch or needle mass (kg), \({\beta }_{0 }, {\beta }_{1}\) and \({\beta }_{2}\) are the model parameters, D is the branch diameter (cm), WH is the whorl height (m) and \(\varepsilon\) is the error term.

Once the models were fitted, needle and branch biomass were estimated for all the branches of the sample trees. The correction factor (Eq. 5) was then used to back transform the predicted values to the original arithmetic scale as discussed above. In a subsequent step, branch and needle biomass at tree level was computed by summing the estimates from every branch.

Upscaling II: From the branch to the tree level

The dry cone mass was summed for each individual sample tree and estimated at the tree level. To estimate the mass of the components at tree level (SW, bark, needle, branch and cone), various models were fitted and compared:

$$\text{lnTBM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnDBH}+\upvarepsilon$$
(8)
$$\text{lnTBM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnDBH}+{\upbeta }_{2}\text{lnH}+\upvarepsilon$$
(9)
$$\text{lnTBM}={\upbeta }_{0 }+{\upbeta }_{1}\text{ln}({\text{DBH}}^{2}\times \text{H})+\upvarepsilon$$
(10)
$$\text{lnTBM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnDBH}+{\upbeta }_{2}\text{lnCL}+\upvarepsilon$$
(11)
$$\text{lnTBM}={\upbeta }_{0 }+{\upbeta }_{1}\text{lnDBH}+{\text{lnH}+\upbeta }_{2}\text{lnCL}+\upvarepsilon$$
(12)

where TBM is the total biomass of each component (kg dry matter per tree); \({\beta }_{0 }, {\beta }_{1}\) and \({\beta }_{2}\) are the model parameters, \(\varepsilon\) is the error term, DBH is the diameter at breast height (cm), H is the total tree height (m) and CL is the live crown length (m). All models were fitted with the function lm by the package “stats” (R Core Team 2023).

Model evaluation

Model residuals were tested for normality and homoscedasticity by applying the Shapiro–Wilk and Breusch-Pagan tests respectively. The Cook’s distance test was used to remove all outliers from the data set. The “leave-one-out cross validation” (LOOCV) machine learning algorithm was used to evaluate the goodness of fit (GOF) of the best models amongst the candidates. In this validation approach, one sample from the dataset is retained at a time while the next one will not be retained. The same model is fitted several times using a different training and testing set each time. The model performance of fitted models were evaluated by using the adjusted coefficient of determination (R2adj), root mean squared error (RMSE) and mean absolute error (MAE). Best models were selected based on the highest R2adj and lowest RMSE and MAE values. All statistical analysis were done by using the software R 4.3.0 (R Core Team 2023).

Residual plots for each biomass component were fitted and the estimates from the best models were plotted against the residuals to determine whether there were any visible patterns in the residuals. The performance of the single best total biomass model was tested against the sum total of the best selected models for each individual component (SW, bark, needle, branch and cone) to determine whether estimates from the single best total biomass model deviates from the sum total of the individual component models. The estimates from the total biomass model were plotted against the residuals (sum total of the observed biomass for the individual components minus estimates from the single total biomass model) to determine whether there were any visual pattern in the spread of residuals.

Comparison of the individually fitted component biomass models to the proportions estimated by the DR

The best selected models for the components (SW, bark, needles, branches and cones) (models 8–12) were fitted to determine the component biomass and proportions as a percentage of the best selected total mass equation. Component biomass models are generally inferior to the individual model that fit total biomass (Eker et al. 2017).

Proportional data are often used when the relative proportions of two or more categories to the whole are biologically more meaningful than the absolute values. DR is commonly used to analyse a set of proportions that sum up to a constant (Douma and Weedon 2018). When data exhibits skewness and heteroscedasticity, the DR can be used without having to transform the data (Maier 2014). In the DR modelling approach, the component biomass estimates were determined as proportions of the total mass equation, always adding up to 1. The DR, a multivariate generalization of the beta distribution is used to model data that represent components as a percentage of the total biomass (Poudel and Temesgen 2016). The DirichletReg package in R 4.3.0 (R Core Team 2023) that provide functions to analyse compositional data were used to determine the estimated biomass proportions of the various biomass components. The alternative parameterization was used to model the biomass proportions. With the alternative parameterization, the vector of expected values μ is modelled by taking the covariates into account to estimate a precision parameter. With the alternative parameterization, multinomial regression is employed and the regression coefficients of one category (base category) are set to zero, b = 0, and this variable is omitted to become the reference (Maier 2014). The parameterization by Maier (2014) and Ferrari and Cribari-Neto (2004) for the Dirichlet distribution is represented as follow:

$$f\left( {y;\mu , \phi } \right) = \frac{1}{{B\left( {\mu \phi } \right) }}\mathop \prod \limits_{c = 1}^{C} y_{c}^{{\left( {\mu_{c} \phi - 1} \right)}}$$
(13)

where 0 < \({\mu }_{c}\)< 1 and \(\phi\) > 0 (\({\mu }_{c}\) = \(\frac{{\alpha }_{c}}{\O }\) and \(\phi\) = \({\alpha }_{0}= {\sum }_{c=1}^{C}{\alpha }_{c}\) are the mean and precision parameters respectively); \({\alpha }_{c}\) > 0, \(\forall\) c are the shape parameters for the components, \({y}_{c} \in \left(\text{0,1}\right), {\sum }_{c=1}^{C}{y}_{c}\) = 1 and B (·) is the multinomial beta function. More information on the DR modelling approach and parameter estimation can be found in Maier (2014).

In this study the covariates of the candidate total mass models were tested, an ANOVA test was conducted and the model with the best describing independent variables were used to estimate the biomass proportions. Residual plots of the best describing independent variables by the ANOVA test were fitted to determine where the DR over or underestimates the biomass proportions and to determine whether there is any visible pattern in the spread of residuals.

The estimated proportions by the DR were compared to the proportions calculated by the best selected models developed in this study. Wilcoxon signed-rank tests were conducted on the estimated proportions of the best models and proportions from the DR to determine whether the means are significantly different from each other or not. The DR proportions of the various components of the sample trees over a DBH range has also been plotted to determine the allocation of resources to individual tree components and in relation to each other with an increase in size.

Model evaluation and comparison

The predictive ability of the best selected biomass models was tested on an independent data set of 9 sample trees obtained from two sites of lower productivity (sites 3 and 4) in the Homtini Plantation, Garden Route study (Van Zyl 2015). The data set of van Zyl’s (2015) lower productivity sites range in DBH from 11.4 to 37.2 cm. The SW and total mass models developed by van Laar and van Lille (1978) in the Jonkershoek valley, developed for high productivity sites with a larger DBH range (29.3–58.3 cm) were also compared to the best SW and total mass models developed in the current study to determine whether the predictive performance of the models hold for lower productivity sites and different ages/dimensions. Environmental and climate conditions of the Garden route trials are not similar to the sites in the Boland, Western Cape where the allometric equations for this study have been developed. The mean annual precipitation (MAP) for the Garden Route trials is ± 843 mm and the region experience an all-year rainfall, where the rainfall in this study is seasonal and mostly occur during winter. The SI20 for the Garden Route trials are 19.1 m, comparing well with the range of SI20 recorded for the four sites in this study (see Online Resource 1).

DBH and H and DBH and CL curves were compared for the sample trees from this study and the two independent sites to determine whether there are any differences in H and CL between the two sites that might explain the biomass allocation. Unpaired t-tests were conducted to determine whether there are significant differences in the mean of H and CL between the data from this study and the sample trees from the independent data set.

To further investigate whether there were significant differences in the estimated means between the component biomass models applied to the independent site, paired t-tests were conducted to determine whether the differences between the population means are significantly different from each other or not.

Application of the best selected models to estimate biomass and carbon sequestration in the Boland region

Component biomass of mid rotation compartments of other lower productivity sites with similar size classes in the Western Cape were estimated by making use of the DR modelling approach. As discussed above, the predictors of models 8 to 12 were used as input variables in the DR, an ANOVA test was conducted and the models with the best describing variables were selected to determine the component proportions. The proportions calculated by the DR was then apportioned in the best selected biomass model (model 8–12) to determine the component biomass. The results were used to provide a range of component biomass and the above-ground carbon sequestration potential of Radiata pine compartments in the Western Cape.

Results

Upscaling phase I: branch biomass allometry

Model 7 (Table 2), including branch diameter and whorl height as predictor variables were selected as the best models to upscale the needle and branch mass to the tree level. The Breusch-Pagan tests indicated that both the needle and branch mass models have met the assumptions of homoscedasticity with p values above 0.05, indicating constant variance of residuals. The p values were below 0.05 in the Shapiro Wilk test for both needle and branch mass, indicating non-normality. The diameter and whorl height coefficients for both the needle and branch mass were highly significant (p < 0.001). Diameter and whorl height were good predictors of needle and branch mass, explaining 74% and 79% of the variance of needle and branch mass respectively.

Table 2 Needle and branch mass models used to upscale to the tree level where D is the branch diameter (cm), WH is the whorl height (m), \({\beta }_{0 }, {\beta }_{1 },{\beta }_{2}\) are the model parameters, CF is the correction factor, RMSE is the Root Mean Square Error and MAE is the Mean Absolute Error

Upscaling phase II: tree biomass allometry

The best performing model for SW, bark and total mass at the tree level was Eq. 10 with the independent variable DBH2H (Table 3, Fig. 3).

Table 3 Total and component biomass used to upscale to the stand level, where DBH is the diameter at breast height (1.3 m), H is the height in meter, \( {\upbeta }_{\text{0} }, {\upbeta }_{\text{1} }, {\upbeta }_{\text {2}}\) are the model parameters, CL is the crown length in meter, RMSE is the Root Mean Square Error and MAE is the Mean Absolute Error
Fig. 3
figure 3

Observed biomass of the best models selected for the various components. The inset plots show the log–log transformation of the various models

For needle, branch and cone biomass, Eq. 8 with independent variable DBH was the best performing model (Table 3 and Fig. 3). Equation 9 resulted in very similar performance to Eq. 8 for needle and branch biomass but since Eq. 8 use only DBH as independent variable and to meet the principles of parsimony, Eq. 8 was selected instead. Online Resource 2 shows the residuals vs. the estimates by the best selected component models. No clear patterns in the spread of residuals were observed for the various biomass components.

The sum total of the best models for the individual biomass components (models 8 and 10) underestimates the total biomass estimated by the best single total biomass equation for trees of higher biomass. The underestimation ranged from 2.0 kg to 27.8 kg for trees with DBH of 20.5 cm and 38.9 cm respectively. Visually, no patterns of the residuals could be observed for the total mass model as shown in the diagnostic plot (Online Resource 3).

The best selected models had the highest R2adj and the lowest RMSE and MAE values. The Shapiro–Wilk test indicated that the residuals of the total needle, branch and total mass models are not normally distributed as the p values were less than 0.05 (0.003 for both needle and branch mass and 0.045 for total mass). Two outliers have also been removed from the needle and branch mass models, leaving a total of 18 sample trees. With the Breusch-Pagan test, the bark mass model showed signs of heteroscedasticity with a p value of 0.012. By closely inspecting the diagnostic plots of the bark mass model, no significant patterns were observed. Collinearity between variables in the models were tested by inspecting the variance inflation factors (VIF) of model values on the arithmetic scale. All multiple regression models had VIFs below 5 indicating acceptable levels of collinearity.

Biomass estimation

Cross validation and Shapiro–Wilk tests have been performed on the observed vs. predicted mass of the various components of the sample trees (Fig. 4). The actual vs. predicted values of all the best models, on the real back-transformed scale (Fig. 4), satisfied the normality assumption (p value > 0.05). Paired t-tests have then been performed on the observed vs. predicted values with p values > 0.05 for all models, indicating that there is no significant difference in the means between the observed vs. predicted values for the models. High R2adj values have been obtained for the predictability of the SW and total mass values. The model with the lowest predictability was the cone mass model with a R2adj value of 0.22.

Fig. 4
figure 4

Observed vs. Predicted biomass of the various tree components by using the best model for each component. The red lines indicate the 1:1 lines (Models 8 and 10, Table 4)

Estimated proportions by the best selected models and Dirichlet Regression

The component proportions (SW, bark, needle, branch and cone) have been regressed against the variables in models 8–12, using the DR modelling approach. The ANOVA analysis of the models indicated that the variables of models 8, 9, 11 and 12 fitted the data the best with highly significant p values. The variables of model 12 (DBH, H and CL) was then fitted to estimate the biomass proportions of the stem, bark, needle, branch and cone biomass (Table 4). The variables of model 12 was selected for the DR as it captures variation in H and CL for trees with similar DBH. For the SW component, the DR overestimated the proportions in the smaller size class range. The DR overestimated the SW component ranging from 0.2% for DBH of 19.3 cm to 4.8% for sample tree with DBH of 10.4 cm. A similar trend can be seen in the branch mass where the DR overestimated the proportions of branch mass in the smaller size class range. For the cone proportions, the residuals are closely alligned to the zero line, meaning the estimates of the best selected model is a close fit to the estimates by the DR (see Online Resource 4).

Table 4 Models tested in the DR with parameter estimates, where DBH is the diameter in cm at breast height (1.3 m), H is total tree height in m and CL is live crown length in m. The component cone mass was used as the reference group

The best selected total mass model (model 10) was then apportioned into the different biomass component proportions as estimated by the DR (variables of model 12). The intercept of all the biomass components are highly significant (p < 0.001) and none of the parameters of the CL are significant for any of the components meaning that CL does not provide evidence of an effect on the independent variables (Table 4). The cone biomass was used as a reference in the precision model estimation, hence there are no parameter estimates for cone mass.

The mean estimated proportions by the best selected models and the mean estimated proportions by the DR with model parameters (Table 4) produced very similar results (Fig. 5). The SW contributed the most to the total AGB followed by branches, bark, needles and cones. The DR slightly underestimated the stem biomass. The mean estimated SW mass proportion by the DR model was 63.4%, compared to 65.5% when estimated with the best selected model. The estimated branch and needle mass proportions were 14.6% versus 14.4% and 8.7% versus 8.6%, respectively, for the DR and the estimated proportions by the best selected models. The Wilcoxon signed-rank tests indicated that the mean differences in proportions between the estimated and the DR values were not significantly different with p values > 0.05 for all components, allowing us to accept hypothesis II.

Fig. 5
figure 5

Mean estimated biomass proportions by the best selected models (Models 8 and 10) and the DR Proportions

From Fig. 6 it is evident that the SW proportion increase proportionately with an increase in DBH while the cone, bark, needle and branch proportions decrease proportionately with an increase in DBH.

Fig. 6
figure 6

Biomass allocation patterns across the DBH range of sample trees as modelled with the DR

Application of best selected models to independent sites

The best selected models developed in this study have been validated using the independent data set of van Zyl (2015) (see Fig. 7). From the SW and total mass graphs it is evident that the biomass components from our study is a close fit to the Garden Route lower productivity sites (site 3 and 4 as labelled in van Zyl 2015). Stronger divergences between the modelled and observed values can be seen for the bark, needle and branch mass (Fig. 7). For the bark and the needle mass the observed values were higher than the modelled values and for the branch mass the modelled values were higher than the observed values. The paired t-tests comparing the values obtained in this study to the two independent sites in the Garden Route (Van Zyl 2015), showed the means of the SW and needle mass were not significantly different from each other with p value of 0.572 and 0.069 for SW and needle mass respectively. For the bark, branch and total mass the means were significantly different from each other with p values < 0.05. Higher bark and needle mass was observed for the independent sites and lower total mass for the current study.

Fig. 7
figure 7

Relationships of the various biomass components of the current study (modelled) in comparison to the biomass components measured by van Zyl (2015) (observed in independent datasets emanating from the Garden Route, South Africa)

From Fig. 8 it is evident that the slopes recorded by van Zyl (2015) was similar in H for a given DBH in comparison to this study and that the trendline cross at a higher DBH, but that the CL in trees with larger DBH were slightly longer in the study of van Zyl (2015) at a higher DBH. From Fig. 8 a divergence can be seen from the small to the higher DBH range for CL. The results for the unpaired t-tests indicated that there were no significant differences in the mean between DBH and H and DBH and CL for the two data sets with p values > 0.05.

Fig. 8
figure 8

DBH and H and DBH and crown length relationships of the biomass components of this study (Boland sites) in comparison to the relationships of the low and medium productivity sites in the study of van Zyl (2015) (Garden Route)

The best SW and total mass models in this study has also been compared to the models developed by van Laar and van Lille (1978) that was developed for a stand with higher productivity, age and DBH range. The models developed by van Laar and van Lille (1978) estimated higher SW and total mass than the best models from this study (Fig. 9). Paired t-tests to compare the means of the SW and total mass found that the differences are statistically significant with p values < 0.001 for SW and total mass.

Fig.9
figure 9

Comparison of the relationship between DBH and SW mass and DBH and total mass for the van Laar and van Lille (1978) models (Modelled) and the modelled values for the current study (Observed)

Biomass estimates extrapolated to typical site qualities in the Boland region

The DR modelling approach was applied to nine unrelated experimental plots situated in the Boland region (described in Chikumbu 2011 (unpublished); Pretzsch et al. 2021). These sites are representative of a range of productivities commonly found in the region, with similar DBH ranges and silvicultural treatments as original sample tree sites, and could therefore be used to determine representative ranges of biomass component masses in typical stands of the region. The ANOVA test conducted indicated that model 8 with DBH alone and model 9 with DBH and H were the best variables to describe the component proportions with highly significant p values. Model 9 was selected for upscaling as it includes the variable H, which captures variability in H for trees with similar DBH. The total mass model (model 10) was then apportioned into the biomass components estimated by the DR (model 9) to provide additive biomass proportions for the various components. The proportions of the various biomass components within the total mass estimated by model 10 was then adjusted to the estimates by the DR. Experimental plots were upscaled to the ha level by applying plot expansion factors. The cone mass was used as reference group and omitted from the results in Table 5 below. Online Resource 5 shows the estimated range of biomass proportions across the sites in the Boland, using the DR.

Table 5 Models tested in the DR with parameter estimates, where DBH is the diameter in cm at breast height (1.3 m). The component cone mass was used as the reference group and ommited from the results

The total mean estimated AGB ranged from 76 924 to 165 898 kg ha−1. Considering a carbon content of 50% for AGB at constant dry mass, the forest sites have sequestered 38 462 to 82 949 kg ha−1 carbon as shown in Online Resource 5.

Discussion

The best models for this study and predictive capability (HI)

The models explaining the most variance of the SW, bark and total tree biomass are those including as predictor DBH2H, explaining 91%, 51% and 89% of the variance respectively (Table 3) allowing us to accept hypothesis I. The DBH2H model coefficients of the best models for SW, bark and total biomass were all highly significant (p value < 0.001). Similar results have been obtained in a study by Bi et al. (2010) where the SW and total mass estimates were the most precise in comparison to the other biomass components. The single best total biomass model (Eq. 10, Table 3), incorporating DBH2 and H provided a good estimate of the total biomass with a highly significant p value and strong GOF statistics. The individual component biomass equations are good to estimate the biomass for the component, but do not always satisfy additivity criteria (see next section). For the total needle, branch and cone mass (Eq. 8) the models with DBH as single predictor were the best performing. The total branch and needle mass models exhibit a weaker fit than the SW and total mass models, explaining 70% and 69% of the variation (Table 3). A weaker fit for crown mass models were also recorded by Čihák and Vejpustková (2018) in an allometric study on Picea abies (L.) and in a study on various pine species in Mexico (Vargas-Larreta et al. 2017). This can be ascribed to higher variance of these values with an increase in DBH, probably due to local environmental conditions. This makes it possible to use the biomass equations across different sites where H data is available (Vargas-Larreta et al. 2017; Chave et al. 2015; Feldpausch et al. 2011). Models including H as independent variable can also be used across sites where the stand density differs as DBH increment is related to stand density but H is less affected (Hummel 2000).

For the needle, branch and cone mass models, the variability increased with increasing DBH. The variability of needle and branch biomass can be ascribed to the effect that different silvicultural operations applied in individual plantations (timing and intensity of thinning and pruning) have on tree architecture. Needle and branch biomass are also affected by environmental stresses brought about by drought and stand density (Gonzalez-Benecke et al. 2021).

Estimated biomass proportions in relation to total mass (HII)

The DR modelling approach implemented in our study solved the additivity problem and the predicted proportions of the DR are a close fit to the proportions by the individual best component equations (Fig. 5). The results of the Wilcoxon signed-rank test indicated that there was no significant difference in component proportions (all components) estimated by the best models compared to the DR, allowing us to accept hypothesis II.

In previous studies the single equation estimate of total biomass has been recommended due to reduced assessment error (Cienciala et al. 2008; Vejpustková et al. 2015). To determine the AGB on an ecosystem level, the single best total mass model (Eq. 10) apportioned into the estimated proportions by the DR is therefore the best approach. Zhao et al. (2016) also found the DR to be the superior in comparison to other modelling approaches to ensure additivity. In that study, the DR slightly underestimated the bark, branch and cone biomass and slightly overestimated the needle biomass. In a study by Eker et al. (2017) where the DR modelling approach was used to model the biomass of Pinus brutia (Ten.), the DR overpredicted branch and SW biomass and underpredicted the foliage biomass. The DR used on a biomass study of Pinus elliottii slightly overestimated stem and bark biomass but largely overestimated branch and foliage biomass (Zhao et al. 2019).

The estimated percentage of SW mass as calculated by the DR range from 57 to 71% and 17% to 27% for the crown mass (branch and needle mass), with an average of 63.4% for stemwood mass and 23.5% for crown mass, depending on DBH. Hakkila and Parikka (2002) found that the SW and crown mass in conifers (average of 18 species) in British Columbia have proportions of 75.2% and 24.8% respectively in relation to total biomass, while the general biomass proportions for the genus Pinus grown on long rotations in South Africa is 65% SW, 10% bark, 20% needles and 5% branchwood (van Breda, personal communication in van Laar 1982). The average estimated SW mass proportions for this study is slightly lower than the average of van Laar (1982), but higher than average for bark mass. In a study by Dovey et al. (2021), a logarithmic increase in SW mass was observed for pines with a maximum between 70 and 80% at maximum harvestable volume.

In this study by applying the best selected models, the SW component increased proportionately in relation to an increase in DBH, while the other components decreased. The SW mass ranged from 20.59 kg for the 10.4 cm DBH sample tree to 373.70 kg for a sample tree with DBH of 38.9 cm. Branch and needle mass fractions decreased proportionately with an increase in DBH and ranged from 4.71 kg to 74.32 kg and 3.97 kg to 33.78 kg respectively for branch and needle mass. Similar results have been obtained for a biomass study on a range of pine species in Mexico where the SW biomass increased proportionally with DBH and the bark and foliage biomass decreased (Vargas-Larreta et al. 2017). Dovey et al. (2021), working with the commercial forestry genera of Pinus, Eucalyptus and Acacia found similar trends in relative biomass proportions as a function of tree volume. Fluctuations in environmental conditions such as light, temperature, moisture, nutrient availability and competition can also bring about variability in biomass allocation over time (Gower et al. 1994). The decrease in the needle mass fraction with an increase in DBH can be ascribed to trees competing for light and trees in dominant positions generally having lower needle mass fractions (Aquino-Ramírez et al. 2015). Trees accumulate mass in the SW with an increase in DBH as trees have to maintain stability as they mature and to achieve dominance in the canopy. Tree branches and needles turn over and are more adaptable to optimal positioning to ensure that adequate light is captured during the rotation.

The DBH range of the 23 sample trees in the study by van Laar and van Lille (1978) range from 29.3 cm to 58.3 cm, while the sample trees in this study range from 10.4 cm to 38.9 cm. The age of the sample trees of van Laar and van Lille (1978) is 29 years while the age of the sample trees for this study varied from 13 years for the Jonkershoek site to 15 years for the Kluitjieskraal site to 18—20 years for the trees at the Grabouw and Lebanon sites. The estimated SW mass proportions of this study are lower than the 89% SW that van Laar and van Lille (1978) estimated, but higher than the 10.8% for the crown proportion. The higher SW proportions in the study of van Laar and van Lille (1978) can be ascribed to the larger DBH and higher age range of the sample trees and tree dominance resulting in larger stem mass and lower crown mass proportions.

Validation of best models from this study to the independent sites (HIII)

The SW and total mass components estimated in this study are a close fit to the smaller sized trees of the Garden Route (labelled as site 3 and 4 as measured by van Zyl 2015), while much higher biomass has been estimated by the van Laar and van Lille (1978) models, allowing us to accept Hypothesis III. Considering the high accuracy to which the estimated SW mass could be applied to an independent site, makes the best selected SW and total biomass models for this site transferable to other lower productivity sites with high predictive capability. The models will be particularly useful to assess smaller tree sizes, such as commonly found in final thinnings of long rotations or in semi-mature stands of short rotation crops. Furthermore, it is useful to estimate biomass for short rotations and to determine the carbon sequestration potential. Considering the increasing interest to estimate and model forest carbon accurately, adequate biomass estimation of forest biomass components under different stages of forest development and environmental conditions are important. The higher SW mass estimations in van Laar and van Lille’s (1978) study can be ascribed to the DBH range at a higher level (thus a change in tree size), more mature age and lower stand densities. The traditional thinning prescription implemented historically, left a remaining stand density of 200 to 250 trees per ha at age 18–20 which was then left grow up to approximately 30 years (Hinze & Zwolinski 2000), but current silvicultural regimes typically leave higher stockings after final thinning (Kotze & du Toit 2012). The most variation can be seen between the bark, needle and branch mass components. The higher bark mass observed in the study by van Zyl (2015) could be ascribed to the more advanced age, where the proportion of outer bark in the lower part of the SW increase with age (Beets and Garrett 2018) and dominance of trees of higher DBH. The higher DBH range recorded in the independent sites could be ascribed to greater resource availability due to constant rainfall for that region. Higher branch mass was observed for this study compared to the study by van Zyl (2015). Similar variability in results have been obtained by a study by Vorster et al. (2020), showing that there was greater variability in the branch mass compared to the total mass. In comparison to the study by van Zyl (2015), the lower needle mass observed in the current study as illustrated in Fig. 7 could be ascribed to the longer crown lengths recorded in the study of van Zyl (2015) (Fig. 8), which could be a result of different pruning regimes, or a difference in environmental conditions (the constant rainfall season experienced in the Garden Route compared to the very seasonal winter rainfall of the Boland region). In a study by Scheepers and du Toit (2020), using a rudimentary water balace approach, larger soil water deficits were recorded for the Boland region (seasonal rainfall) compared to sites in the Garden Route and the Tsitsikamma (all-year rainfall). The lower soil storage capacity, high soil water deficit and very seasonal rainfall experienced in the Boland region may explain the variation in branch, needle and bark mass. The sample trees might have allocated more resources to branch formation post thinning to fill the canopy gaps created by the thinning operation. Branch mass increases with stand growth, but the variability depends on the site index and stand uniformity (Dovey et al. 2021). Lastly, the developed needle mass equation in this study can be used to determine the needle mass, specific leaf area (SLA) and this can be used in the estimation of leaf area index (LAI) (Dovey & du Toit 2006; Lopes et al. 2016). It is likely to have a fair degree of accuracy on lower productivity Radiata pine sites, as it was developed from comparable site types.

Application of the DR to other lower productivity sites in the Western Cape

The AGB of the eight experimental plots situated in the Boland region range from 76 924 to 165 898 kg ha−1 for an age range of 16 to 23 years. This is lower than the 184 860 kg ha−1 that was reported by van Laar and van Lille (1978) with age 29 years. The higher biomass reported by van Laar and van Lille (1978) could be ascribed to the higher DBH range and age of the trees and a higher productivity site. The study by van Zyl (2015) reported AGB of 63 234 and 76 286 kg ha−1 for the two lower productivity sites (sites 3 and 4) versus 191 292 and 255 193 kg ha−1 for the two higher productivity sites (sites 1 and 2). The Garden route and Boland studies therefore produced comparable results, but the former covered a larger range of site productivities, with the Boland sites generally falling in the low to medium biomass categories. The absence of high biomass producing sites in the Boland region is attributable to the high water deficits (Scheepers & du Toit 2020), brought about by the seasonality of the rainfall, as explained before.

Conclusions

The SW and total mass equations with combined variable (DBH2H) demonstrated the best predictability for this study and were a close fit to independent data of similar age and size [the lower productivity sites in the Garden Route]. We accept Hypothesis I and propose that the combined variable (DBH2H) should be included as an independent variable in allometric models where H data is available in order to estimate the SW and total mass accross different site qualities and stand densities. The inclusion of H in the total branch and needle mass models have only marginally improved the fit in terms of the GOF criteria. The use of DR with DBH, H and CL variables gave farily similar results, taking into account differences in H (or CL) and tree architecture across variable site quality conditions.

The allometric equations developed in this study will be useful to apply to small to medium sized trees as the equations developed for the high productivity site in Jonkershoek (Boland) does not accurately estimate the biomass of medium size tree classes of mid or short rotation age. Considering the shorter rotations, mid rotation thinnings and the lack of published equations to estimate the biomass of such stands, the allometric equations developed in this study will bridge this gap in the literature.

Differences can be observed in the slopes between the needle and branch mass for the winter and constant rainfall zones. Considering that the CL’s recorded in the Garden Route has a higher slope for a given DBH than the allometry of the current study might explain the larger needle mass. The recorded crown mass of the independent site (Garden Route) might also be more dense (more needles per unit area of land) than the crown mass of the sample trees used in the current study (Boland region with seasonal climate).

The total needle mass equation in combination with the SLA can be used across lower productivity sites to directly determine the LAI. The determined LAI can then be used to callibrate and convert the so-called plant area index (PAI, obtained through optical measurements) to a corrected LAI value. These results may help forest managers understand how the different management regimes may differentially affect the biomass and carbon storage potential of their forests.

The carbon sequestered in AGB of short-rotation Radiata pine sites in the Boland region ranges from 38.5 to 82.9 tons C per ha. Expressed as carbon dioxide mass, this is equivalent to a range of approximately 141 to 304 ton C02 ha−1, across stands with MAI’s ranging from 7.3 to 17.2 m3/ha/yr.

All the hypothesis stated in this study have been accepted. The acceptance of hypothesis I highlights the importance of including H in combination with DBH2 when the model is extrapolated to sites with varying environmental conditions. The statistically similar results for all biomass component proportions (hypothesis II) when comparing the proportions estimated by the best selected models to the proportions estimated by the DR indicates the robustness and accuracy of the best selected component models to estimate biomass and proportions. The results of hypothesis III clearly demonstrate the erroneous results when applying allometric equations developed for short rotation pine stands of lower productivity to sites of higher resource availability. Caution should therefore be exercised when applying the equations developed in this study to sites with higher resource availability.