Introduction

Soil variability has been widely studied in the geotechnics (Deng et al. 2022) because it has a significant impact on analysis of geotechnical processes and hazards such as slope instability and liquefaction, and on the design of geotechnical structures such as piles and foundations (Chen et al. 2020, 2022; Zhang et al. 2021). Probability density functions and statistical analyses can be used to quantify uncertainties associated with variations in soil geotechnical properties and inform risk-based analyses and design (Vanmarcke 2010; Wang and Li 2021). Lumb (1966, 1970) presented seminal papers on the statistical variation of soils from Hong Kong. Kulhawy and Mayne (1990) detailed many transformation models for prediction of key geotechnical parameters for foundation design. Phoon and Kulhawy (1999a, 1999b) suggested using the coefficient of variation (COV) to assess geotechnical variability. To reduce and quantify the uncertainty associated with data scarcity for different engineering tasks, geotechnical databases can be used alongside site specific data (cf. Phoon 2020; Phoon et al. 2022a, 2022b).

However, to develop statistical models, sufficient datapoints are required (e.g., Montgomery et al. 2007). The construction of regional databases that systematically collate geotechnical information is a vital step in providing the basis for developing new transformation models or testing existing ones. For example, Voyagaki et al. (2022) present a new pile-load test database for UK soil deposits to support more efficient pile design. However, the scarcity of the data in many countries and cities presents a significant challenge to the development soil parameter variability models. The establishment of well-curated, well-documented, and open databases in such locations is an important step for enabling the quantification of geotechnical uncertainty and hence in improving the safety and serviceability of new constructions. Recent work by Gilder et al. (2020) presented a new geo-database, SAFER/GEO-591, for the Kathmandu Valley, Nepal, which is in a multi-hazard environment. This work has led to improved understanding of soil variability in the region, focussing particularly on the Vs30 parameter, which describes the average shear wave velocity over the first 30 m of soil (De Risi et al. 2021; Gilder et al. 2022). Such resources assist geotechnical practitioners to improve design practice and safely reduce design conservatism due to parameter variability, although the issue of geo-system uncertainty remains (Bolton 1981; Vardanega and Bolton 2016).

Like Kathmandu, the city of Quito in Ecuador is a multi-hazard environment which may be considered data-poor from a geotechnical perspective. Much of the soil testing information has not previously been made available in open-source repositories or geotechnical databases. In this paper, the recent development of an open geodatabase for the Metropolitan District of Quito (DMQ) is summarized. This extended database, QUITO/GEO-299 (Othman et al. 2023), comprises data for multiple geotechnical parameters including those relating to slope stability and the seismic response of soils. A new and detailed statistical analysis of the data is then carried out. The main aims are the following:

  1. (i)

    Assess the fit of different probability distributions for key geotechnical variables using the Akaike information criteria (AIC) which accounts for the complexity of the distribution model and overfitting (AIC is typically used in information theory and less frequently employed in geotechnical engineering contexts).

  2. (ii)

    Develop and test transformation models for design parameters like Vs30 which are relevant in earthquake-prone contexts.

  3. (iii)

    Investigate assumptions by local engineers about differences between soils in north and south Quito using parametric and non-parametric hypothesis testing to assess the statistical significance of potential differences.

The results of this analysis may be relevant for regional geotechnical design and hazard assessment.

The Quito case-study

Quito, the capital city of Ecuador, is located in the Andes at an altitude of 2850 m above sea level and is flanked by volcanos including Pichincha and Cotopaxi (Zaaijer 1991). The city is in a mountainous multi-hazard environment and is at risk due to earthquakes (Nikolaou et al. 2016; Franco et al. 2018), volcanic eruptions, landslides, debris flows, and flash floods. The soils in the Andean zone of Ecuador are predominantly derived from deposits of volcanic ash mixed with volcanic sediments and are often referred to as “Cangahua” The loess-type volcanoclastic succession is estimated to have been deposited under cold and dry periglacial conditions in interglacial periods between 140 and 24 ka (Clapperton and Vera 1986; Sánchez et al. 2013). Over time these ancient pyroclastic deposits become hardened due to drought rather than through cementation (Vera and López 1992). In general, any old and hardened soils that have an ancient pyroclastic formation is called “Cangahua” by local practitioners (Custode et al. 1992; Zebrowski 1997). In Quito, the most recent Upper Pleistocene rocks are composed of the Cangahua formation which is widespread in the intermontane Quito-Guayllabamba basin in which the city is located. On top of the Cangahua are Holocene deposits composed of pyroclastic flows from the Pululahua volcano in the northern part of the city, with thicknesses of up to 8 m, as well as palustrine-lacustrine deposits toward the south center with thicknesses between 14 and 20 m, which are intercalated by volcanic ashes (Alvarado et al. 2013).

The soils and Cangahua deposits in Quito might be expected to have some similarities to those found across the Andean region, for example, in cities such as Bogota, Colombia, and the northern Chilean territory. However, Bogota soils are mainly composed of lacustrine deposits (Mendoza et al. 2019; Caicedo et al. 2018). Such soils are observed to have extreme Atterberg limits and water content values. As for the northern Chilean territory, the dominant deposits are found to be saline soils which have high rigidity and shear strength (Foncea et al. 2005). This variety in soil composition in regions surrounding the Andes requires in-depth analysis of the soil and to present a geotechnical database of soil parameters.

To study the variability of key geotechnical parameters for assessing landslide hazard in Quito, Hen-Jones et al. (2022) compiled a preliminary version of the Quito geodatabase from soil testing data in published articles, dissertations, and reports (204 datapoints). The findings of Hen-Jones et al. (2022) showed that while a considerable amount of triaxial UU testing is available for the Quito soils, direct shear and drained triaxial testing is less common, which means that there is potentially a lack of high-quality soil data for assessing slope stability in the region. Stochastic slope stability modeling may therefore be needed to study the deeper uncertainties affecting landslide hazard assessments in the area (see, for example, the methodology presented in Almeida et al. (2017) and Bozzolan et al. (2020)).

Othman et al. (2023) expanded the Quito geodatabase to include new soil test data and geotechnical parameters. The original database developed by Hen-Jones et al. (2022) focused on geotechnical parameters relevant for slope stability assessments (i.e., soil friction angle and apparent cohesion). The expanded database included a much wider set of geotechnical testing data including shear wave velocity data which is important for site response analyses for this earthquake prone region (Schuster et al. 1996). The importance of Vs30 for the classification of the soils in Quito has long been recognized by the local committees developing seismic risk management guidance documents such as Municipio Q. (1994), Valverde et al. (2002), and ERN (2012). The focus of soil classifications presented in Valverde et al. (2002) and ERN (2012) has previously been based on shear wave velocity (Vs). Understanding soil stiffness via shear wave velocity (Atkinson 2000) is essential for site response studies in geotechnical earthquake engineering (Kramer 1996) and for probabilistic seismic hazard assessments (PSHA) (McGuire 2004). Previous analysis undertaken by the Metropolitan District of Quito (DMQ) is divided the city into micro zones based on Vs (see Valverde et al. 2002), where the values used for the zonation were obtained from boreholes of various depths. For the purposes of the work by Othman et al. (2023), Vs30 were derived using the extrapolation method of Boore (2004), where borehole depths did not reach 30 m. This expanded database is named QUITO/GEO-299 and is available for download from the University of Bristol Research Data Facility (Othman et al. 2023).

Soil seismic and geotechnical classification maps in Quito

This section provides a brief review of the sequence of soil classification frameworks developed for seismic risk management in Quito in 1994, 2002, and 2012. As noted in above, these three soil classifications show a dependency on the shear wave velocity testing for classifying the seismic behavior of soil in the area.

The first soil classification in Quito was introduced in 1994 as part of the “Seismic Risk Management for Quito-Ecuador” Municipio Q. (1994), project in which a seismic, geological, and soil mechanics study was carried out (Municipio Q. (1994)). Soils were classified into three categories: the eastern flanks of the Pichincha volcano (f) comprising Holocene alluvial fans and laharite deposits on the rocks of the Cangahua formation; fluvial, marsh, and lacustrine deposits (l) and strata of the Cangahua formation with volcanic ash intercalation (q) (Municipio Q. (1994)). Using these categories, combined with cross-hole shear wave velocity test and the resulting vibration periods in the soil, the DMQ was divided into 20 zones. Each zone represented a unique soil composition and/or vibration period of the soil (Valverde et al. 2002). A map showing the spatial distribution of these seismic soil classes is shown in Fig. 1. This classification was used for seismic engineering design in Quito until 2002.

Fig. 1
figure 1

Soil distribution of Quito based on the Municipio Q. (1994) classification where (f) the eastern flanks of the Pichincha volcano, (l) lacustrine deposits in the central depression of the city, and (q) volcanic ash with cangahua formation and on the eastern side of Quito (modified from Falconí 2017)

Using the collected data found in Municipio Q. (1994) report, Valverde et al. (2002) updated the seismic zoning of Quito in two stages: the first stage was to zone the city based on topographic and geological information, followed by soil characteristic obtained from an additional 11 cross-hole shear wave velocity tests for depths of around 20 m. Following the launch of the Ecuadorian Code of Practice (CPE 2001), part of the Ecuadorian Construction Code, the soil classification framework was updated and incorporated Vs measured at various depths (Fig. 2a), along with the vibration period obtained from the shear wave testing. According to this updated classification, three types of zones were defined for the city of Quito: zone S1 corresponds to good quality or slightly altered, highly cemented rock of the Cangahua formation with Vs greater than 750 m/s; zone S2 corresponds to less consolidated strata of the Cangahua formation, lake, and laharite deposits; zone S3 corresponds to soft soils such as terraces at the edge of streams and marshy deposits with Vs less than 200 m/s.

Fig. 2
figure 2

a Soil distribution according to the CEC2000 Code where S1 (very hard soil, with shear wave speed greater than 750 m/s), S2 (hard ground) and S3 (soft soil, with vibration periods greater than 0.6 s) (modified from Falconí 2017). b Soil classification distribution based on ERN2012 results where C denotes soils with 760 m/s > Vs30 ≥ 360 m/s, D denotes soils with 360 m/s > Vs30 ≥ 180 m/s and E denotes soils with Vs30 < 180 m/s (Aguiar et al. 2017). The north-south boundary line (Northing of 9976470) is indicated as per Hen-Jones et al. (2022)

Since 2012 the soil classification in Quito has followed the specifications described in the International Building Code (IBC 2003), such that soils are classified into six categories based on the shear wave velocity at 30 m depth (A to F, where A represents a rock soil and F: represents very weak soils) based on the soil shear wave velocity. To improve on the previous classification data, the new IBC-based classification benefitted from 25 down-hole shear wave velocity tests performed at various locations in the city. This classification was the result of the Natural Hazard Assessment research (ERN 2012) and is shown in Fig. 2b. It should be noted that DMQ contains soils classified as C, D, and E according to the IBC soil classification. However, according to this study, most of the soils in the city of Quito are class D soils with Vs between 180 and 340 m/s, which characterizes less consolidated intermediate soils (Falconí 2017).

Based on the current soil classification (Fig. 2b), there is an apparent difference between soils in the north and the south zones. Using this information, the design parameters and standards tend to differ between the zones, resulting in a more conservative design standards for constructions in the south zone. This difference between north and south is correct for the current soil classification (ERN 2012); however, it requires verification for other design parameters that are not included in the current classification.

Data collection and building the database

The QUITO/GEO-299 database (Othman et al. 2023) is an open-source repository; developed from the collation, curation, and analysis of geotechnical and geological information from the city of Quito. The database format is informed by the terminology, methodology, and file structure developed by the Association of Geotechnical and Geoenvironmental Engineering Specialists (AGS) and indicated in its manual (AGS 2017). Some modifications were made to the original format of the AGS tables to adapt them for the Quito context. For further details see Othman et al. (2023).

The database consists of both in situ tests and geotechnical laboratory tests data from 299 locations in DMQ collected from open access sources including dissertations and reports. The database geospatial information is given in the projected coordinate system: WGS_1984_UTM_Zone_17S. The database consists of one spreadsheet workbook with each internal worksheet or “tab” supplied as a separate .csv file (QUITO/GEO-299 is presented in a similar format to SAFER/GEO-591, see Gilder et al. (2019)). Table 1 presents a summary of the data from the QUITO/GEO-299 database in which the data are first divided based on the site condition from which they were collected from into boreholes (BH) and superficial samples (SS). Data collected from boreholes can be at various depths as indicated in the database, while superficial samples describe the top 1 m.

Table 1 Summary of data compiled in the QUITO/GEO-299 database from the sources shown in the second column (for details see Othman et al. 2023)

The database expands on the data collated and analyzed by Hen-Jones et al. (2022) by including more data sources (299 points in QUITO/GEO-299 compared to around 200 from Hen-Jones et al. 2022) and presents a different analysis of the triaxial and direct shear tests from the database. This study considered the dataset in terms of the total stress parameters to increase the sample size for the study of soil variability. For more information on the types of tests in the database and issues with their interpretation see Othman et al. (2023). QUITO/GEO-299 has been presented in a format similar to AGS to assist with data transfer between users of the database. This was also done for the Kathmandu soil database and the building of QUITO/GEO-299 followed a similar methodology to that presented in Gilder et al. (2019, 2020).

Data identification and spatial distribution in Quito

The locations of the 299 points from QUITO/GEO-299 were plotted on the available geological map of the DMQ (Municipio de Quito 2015) using Geographical Information System (GIS) software (Fig. 3), and the percentages of each geotechnical parameter set represented in each geological layer were calculated (Fig. 4 and Table 2).

Fig. 3
figure 3

Geological layers in DMQ from Municipio de Quito (2015) with the locations of the data from QUITO/GEO-299 used in this study

Fig. 4
figure 4

Sites for which specified soil parameters were acquired in DMQ: a sites with soil density data, b water content, c shear wave velocity at 30 m, d SPT tests, e plasticity index, f angle of friction from direct shear tests, g apparent cohesion obtained from triaxial tests, and h angle of friction from triaxial tests

Table 2 Symbols and descriptions for the geological layers shown in the maps (Figs. 3 and 4), and the distribution of the geotechnical parameters in each distinctive layer

Figure 3 shows that the highest density of geotechnical data available in the DMQ is located in the west which represents the most heavily populated part of the city along with the presence of an underground metro line where extensive ground investigation soil testing had been carried out prior to construction. Knowing the approximate location of each site in the database means that the variability in Quito’s soils can be studied and trends in soils with the same geological properties can be identified. This map will also be useful for future assessments of earthquake and landslide hazards in Quito.

Using the same map, the spatial distribution of each geotechnical parameter in DMQ can be visualized (Fig. 4). In general, the distribution of all the parameters follows the same trend of the map of Fig. 3 except for the friction angle from the direct shear test (ϕDS) for which all datapoints are concentrated in the north of the city. Using Fig. 4, the geotechnical parameter distributions in each geological layer in DMQ are obtained and summarized in Table 2. Also, north-south analysis of the data can be performed on all the geotechnical parameters in the database (plasticity index, water content, SPT blow count, Vs30, apparent triaxial cohesion, and friction angle) with sufficient distribution of the data between north and south. From Table 2, most of the geotechnical data are concentrated in the Lacustrine (Qal) and Streams of volcanic sands (Qvi) layers. In the other hand, the steep slopes and cornices (Qvlr), undifferentiated debris-lahars (Qvl), recent uncompacted sand (Q2), and fractured dacites (QVpl) layers have no geotechnical data collected.

Statistical analysis and parameter distributions

To quantify the variability in the soils in the DMQ, an initial statistical analysis of the database was performed and is summarized in Table 3. The SPT (N) shows the highest variability of all the geotechnical parameters (COV = 83%), while soil density (ρ) has the lowest variability. In general, the geotechnical parameters of Quito’s soils show a relatively high variability.

Table 3 Statistical analysis of key geotechnical parameters in the Quito geodatabase, QUITO/GEO-299

Five continuous probability distributions were tested to identify the best fit for the soil properties relevant to landslide and earthquake hazard assessments and geotechnical design (c.f. Shepheard et al. 2019). These were the truncated normal (Eq. 1), lognormal (Eq. 2), Weibull (Eq. 3), exponential (Eq. 4), and truncated generalized extreme value (GEV) (Eq. 5). For the Weibull distribution, A represents the scale parameter and B represents the shape parameter (MathWorks 2022). The truncations for both the normal and GEV distributions were used to guarantee that the distributions return positive values.

$$f\left(x|{\mu}_N,{\sigma}_N\right)=\frac{1}{\sigma_N\sqrt{2\pi }}{e}^{\frac{-{\left(x-{\mu}_N\right)}^2}{2{\sigma_N}^2}},\kern0.5em x\ge 0$$
(1)
$${\displaystyle \begin{array}{cc}f\left(x|{\mu}_{LN},{\sigma}_{LN}\right)=\frac{1}{\sigma_{LN}\sqrt{2\pi }}{\int}_0^x\frac{1}{t}\mathit{\exp}\left\{\frac{-{\left(\log t-{\mu}_{LN}\right)}^2}{2{\sigma_{LN}}^2}\right\} dt,& for\kern0.5em x>0\end{array}}$$
(2)
$$f\left(x|A,B\right)=\left\{\begin{array}{lcc}\frac{A}{B}{\left(\frac{x}{A}\right)}^{B-1}& {e}^{-{\left(x/B\right)}^B}& if\ x\ge 0\\ {}\;\;\;\;0& & if\ x<0\end{array}\right.$$
(3)
$$f\left(x|{\mu}_E\right)=\frac{1}{\mu_E}{e}^{\frac{-x}{\mu_E}}$$
(4)
$$f\left(x|k,{\mu}_{GEV},{\sigma}_{GEV}\right)=\left(\frac{1}{\sigma_{GEV}}\right)\;\mathit{\exp}\;\left(-{\left(1+k\frac{\left(\textrm{x}-{\mu}_{GEV}\right)}{\sigma_{GEV}}\right)}^{-\frac{1}{k}}\right)\;{\left(1+k\frac{\left(\textrm{x}-{\mu}_{GEV}\right)}{\sigma_{GEV}}\right)}^{-1-\frac{1}{k}}$$
$$for\kern0.5em 1+k\frac{\left(\textrm{x}-{\mu}_{GEV}\right)}{\sigma_{GEV}}>0$$
(5)

To select the optimum distribution that describes the data in this study, the Akaike information criterion (AIC) was used along with the corrected form of the method (AICc) (Akaike 1974). The AIC method (Eq. 6) and AICc (Eq. 7) calculate the goodness of fit for each geotechnical parameter based on the number of parameters of the distribution used for the fit. AIC method was selected because it accounts for deviations from the model with minimizing model complexity. The corrected AIC (AICc) was used for selecting the best fit distribution where there were low numbers of samples. The AIC and AICc values for each distribution proposed in this study are summarized in Table A-1 and Fig. A-1 (Appendix I).

$$AIC=2k-2\ln \left(\hat{L}\right)$$
(6)
$$AICc= AIC+\frac{2{k}^2+2k}{n-k-1}$$
(7)

where k is the number of estimated parameters of the model, n is the sample size, and \(\hat{L}\) is the maximized value of the likelihood function for the model.

Based on the AICc selections provided in Appendix I, the best fit distributions are summarized in Table 4 and Fig. 5. The Kolmogorov-Smirnov test (KS test) was also used to verify the goodness of fit for the selected distribution based on the AIC (Fig. 5). The critical distance between the selected distribution and the fitted CDF for the data was calculated using Eq. (8) (Ang and Tang 2007) and compared to the D-statistic value of the fit. The selected significance level for this study was 0.05 and is represented by the two lines ± 5% in Fig. 5. The results of the KS test confirm that the selected distributions for the data in Table 4 are acceptable for the variability of the soils in the DMQ.

$$cv=\frac{1.36}{\sqrt{n}}$$
(8)
Table 4 Best fit distributions for each soil parameter. Par 1, Par 2, and Par 3 describe function fitting parameters: μN, σN, and μLN, σLN for normal and lognormal distributions, respectively, A, B for Weibull, and k, σGEV, and μGEV for GEV distributions [no zero values included]
Fig. 5
figure 5

Best fitted distribution for each soil property analyzed in this study presented as probability plots (CDFs) for a density, b water content, c Vs30, d N, e plasticity index, f angle of friction from direct shear tests, g apparent cohesion obtained from triaxial tests, and h angle of friction from triaxial tests

The above analyses show that the Weibull distribution is the best representation for N, plasticity index, Vs30, direct shear, and triaxial cohesion, while soil density is best described by the lognormal distribution. Triaxial friction angle is best described using the truncated normal distribution and the water content is best described using the truncated GEV distribution. Using these probability density functions (PDFs) to describe the variability in Quito’s soils can help describe the high difference noted in the initial statistical analysis. To further describe the variability of Quito’s soils, the same statistical analysis and distribution fitting were performed on the geotechnical parameters for each geological layer shown in Table 2. In this analysis, the geotechnical parameters were considered for each layer if there are more than 20 different datapoints representing the layer, assuming that any dataset with fewer datapoints cannot be modeled to a sufficient level of statistical significance (Minitab 2017). The distributions fitted for each layer are shown in Appendix II as CDF plots which confirms that the fitted distributions for each geological layer mainly agree with the fitted distributions for the general geotechnical parameters represented in the CDF plots of Fig. 5. This suggests that the variability of Quito’s soils does not depend on the geological layer of the soil and hence, can be generalized for the study region.

Variation between north and south of Quito

The current soil classification in Quito (ERN 2012) suggests that the soil is different between the north and south of the city, and that can be seen clearly in Fig. 2. To investigate this observation, a line dividing the city into north and south zones (Fig. 6) was set (Northing of 9976470). This line approximately splits the database in two equal samples and is based on the geographic center of the city as proposed by Hen-Jones et al. (2022). To verify the feasibility of this line, two lines were set at ± 10% of the main line (Northing of 9978765 and 9974214) with the statistical analysis for the ± 10% lines summarized in Appendix III.

Fig. 6
figure 6

North and south zones of Quito based on Hen-Jones et al. (2022), with Northing of 9976470 (middle line), the + 10% line of Northing 9978765 (top line), and the − 10% line with northing 9974214 (bottom line)

Statistical analyses were performed on the geotechnical parameters for each zone and the results of the statistical analysis between north Quito and south Quito are shown in Table 5, while the CDF plots for the distributions described in the previous section are shown in Appendix IV. Comparing the results in Table 5 with that found in Appendix III, it is apparent that changing the location of the line does not affect statistical distribution of the soils. The additional analysis in Appendix III represents an assessment of the sensitivity of the conclusion to slight changes in the position of the north-south boundary line. The statistical analysis for ϕ obtained from direct shear tests was not included in this study as all the available data on this parameter are in north Quito (Fig. 4c). The statistical analysis of the data for north and south Quito shows some variability between these two zones, especially for N. However, the different sample sizes of the two zones, for all parameters, require that another test is used to verify the differences between them independently of the sample size for each zone. The Welch t-test (Welch 1938, 1947; Aspin 1948) was used to compare the variability for the parameters represented by truncated normal and lognormal distributions. Welch t-test is a measure of the difference between the means of two sets of data regardless of the sample size where the hypothesis is that there is no significant difference between the two sets of data means. The non-parametric Wilcoxon signed rank test (Woolson 2008) was used for the situations in which the population distribution was normal or lognormal. Wilcoxon signed rank test measures the difference between the two samples with the hypothesis that the median difference between the two sets is equal to zero (there is no significant difference between the two samples). Being non-parametric, the outcome of the Wilcoxon test is independent of the best fit distribution selected.

Table 5 Statistical analysis results between the north (NQ) and south (SQ) of Quito

The variabilities of the geotechnical parameters of the north and south Quito, as determined using the Welch t-test and Wilcoxon signed rank test, are summarized in Tables 6 and 7. In these two tables, the acceptance and rejection of the null hypothesis for all the geotechnical parameters in Quito are similar for Welch t-test and Wilcoxon signed rank test. This indicates that the parametric assumptions of Welch t-test are viable. As for the results in Tables 6 and 7, it is noted that there are significant differences between the north and the south Quito sample populations in terms of density, water content, triaxial angle of friction, and shear wave velocity at 30 m. This indicates that the soils between north and south are not only different in terms of Vs30 but also different in terms of other geotechnical parameters. Also, the obtained results for the Vs30 agree with that found in the current soil classification in Quito (ERN 2012). In Hen-Jones et al. (2022), direct shear and triaxial data were considered together; however, in this study, the data for direct shear and triaxial were separated by shear mode.

Table 6 Welch t-test results between north and south for the geotechnical parameters
Table 7 Wilcoxon signed rank test results between north and south for the geotechnical parameters

Using the ± 10% north-south Quito dividing lines described before, further analysis of the data was performed using Welch t-test and Wilcoxon signed rank test. The results for this analysis are summarized in Appendix III. Comparing the results for the + 10% line to that in Table 5, it is found that there is no difference in the results obtained for the null hypothesis acceptance between the two lines except for SPT (N), which shows a difference between the two lines in the Wilcoxon signed rank test. Similarly, the results for the − 10% line are similar to those presented in Tables 6 and 7, with a difference in Wilcoxon signed rank test for the SPT (N) and ρ. The difference in the outcome for SPT (N) and ρ in both lines (± 10%) requires further investigation in terms of more data collected for the SPT (N) and ρ. However, the variation in SPT (N) data can be related to inconsistency of the testing procedures used in performing the tests. Finally, it should be noted that despite the similar result obtained in the Welch t-test, the variation noted in the Wilcoxon signed rank test is more significant due to the higher reliability of this non-parametric test.

Geotechnical transformation models

Transformation models are widely used to present the geotechnical parameter relationships with each other for a given soil type (Phoon and Kulhawy 1999a, 1999b). Such models can be used to estimate hard-to-measure fundamental soil properties by using soil properties that are more readily measured in field or laboratory tests (Kulhawy and Mayne 1990). Currently, regression analysis of geodatabases is widely adopted to develop transformation models that can estimate the fundamental soil properties for various soil types (Löfman and Korkiala-Tanttu 2022; Knuuti and Länsivaara 2019; Van der Krogt et al. 2019; Phoon 2020).

In this study, regression analysis was performed between the geotechnical parameters to develop viable models that can estimate more complex parameters. However, before starting the regression analysis, it was important to identify which soil properties could be correlated to other soil properties. To identify the possible linear correlations between each soil property, the goodness of fit was assessed using the correlation coefficient (r) and the probability of the null hypothesis that the r value is 0 (p) (i.e., probability of no correlation) (e.g., Montgomery et al. 2007). The results showed that there is a correlation between the triaxial cohesion (cTX) with SPT (N) and density (ρ) as well as between the triaxial friction angle (ϕTX) with SPT (N), plasticity index (Ip), and density (ρ). Another correlation was found between the shear wave velocity at 30 m (Vs30) against SPT (N). In this study, SPT (N) was used instead of SPT (N60) based on the findings of the study conducted by Wair et al. (2012) which suggests that there is no notable difference between using SPT blow count (N) and corrected SPT blow count (N60) in estimating the shear wave velocity (Vs30).

In Hen-Jones et al. (2022), the variabilities of the peak friction angle and apparent cohesion were studied along with variability of slope geometry parameters relevant to slope stability assessment. An offset between north and south Quito was found for many of the geotechnical parameters studied and the following transformation models were presented (Eqs. 9 and 10):

$${\displaystyle \begin{array}{cc}{\phi}_{TX/ DS}^{\prime }=53.276{w}^{-0.147}& \left[n=44,r=0.42,p=0.03,z=0\right]\end{array}}$$
(9)
$${\displaystyle \begin{array}{cc}{\phi}_{TX/ DS}^{\prime }=52.396-18.417(e)& \left[n=44,r=0.67,p<0.001,z=0\right]\end{array}}$$
(10)

where ϕTX/DS is the effective peak friction angle from drained triaxial, or direct shear tests, w is the soil water content, and e is the soil void ratio.

The results for the new regression analysis in this study are presented in Table 8 which summarizes all the transformation models along with their, r, R2, SE, p-values, and percentage falling within ± 50% and ± 25% prediction bounds. The detailed regression analyses for all transformation models are provided in Appendix V. The resulting equation is a logarithmic model as follows:

$${\displaystyle \begin{array}{cc}{Vs}_{30}=122.6\ \mathit{\ln}(N)+59.88& \left[n=37,r=0.87,p<0.001\right]\end{array}}$$
(11)
Table 8 Transformation models obtained for the geotechnical parameters in Quito and corresponding regression coefficients for each model

To further evaluate the goodness of fit for each model in Table 8, the data were plotted against the predicted for each model and are shown in Appendix V. For each of the figures, the data are checked with ± 50% and ± 25% prediction bounds. Using the ± 25% predictive bounds, only the Vs30 against SPT (N) model has a high value of data falling within these bounds (78.4%) which is shown in Fig. 7.

Fig. 7
figure 7

Measured vs. predicted Vs30 plotted with ± 25% prediction bounds for all the available data

Figure 8 was constructed to show the models for all available Vs30 and SPT (N) data (Eq. 11); the regression when using all the available data in the north zone (data located above the north-south boundary line) with a logarithmic model in Eq. 12; and the regression when using the data available in the south with the logarithmic model in Eq. 13.

$${\displaystyle \begin{array}{cc}{Vs}_{30}=65.33\ \mathit{\ln}(N)+238.50& \left[n=8,r=0.88,p=0.004\right]\end{array}}$$
(12)
$${\displaystyle \begin{array}{cc}{Vs}_{30}=114.3\ \mathit{\ln}(N)+61.17& \left[n=29,r=0.87,p<0.001\right]\end{array}}$$
(13)
Fig. 8
figure 8

Regression analysis results between Vs30 and SPT (N) for all the data, the data only in the north zone, and the data only in the south zone of Quito

Based on the results shown in Fig. 8, accompanied with the measured against predicted values (Fig. 9) with ± 25% bounds for Vs30 predicted from SPT (N) for north and south data, it can be noted that the regression model obtained in Eq. (11) is more suitable for use in the south zone. This is due to the low number of datapoints of Vs30 against SPT (N) in the north (only 8) despite the larger number of Vs30 data in the north zone. This suggests that more testing is needed for Vs30 and SPT (N) in the north zone of Quito.

Fig. 9
figure 9

Measured vs. predicted Vs30 plotted with ± 25% prediction bounds for all the available data in a the north zone and b the south zone of Quito

Summary and conclusions

In this study, the variabilities of key soil geotechnical properties in Quito have been quantified in terms of their basic statistical characteristics and probability distributions. New transformation models have also been developed for the estimation of Vs30(an important parameter for the assessment of the seismic response of soils and for earthquake resilient design of buildings). The data used for the analysis were obtained from the newly extended open access geotechnical database for the soils of Quito, QUITO/GEO-299 (Othman et al. 2023) which allowed a systematic assessment of soil variability that was not possible before.

The locations of all geotechnical datapoints in the database (i.e., the sites of soil samples, boreholes, and trial pits) were plotted on the geological map of Quito. The results of the variability analysis of Quito’s soils indicated that that N, Ip, Vs30, ϕDS, and cTX are best fitted by Weibull distributions while truncated normal distribution is the best fit for ϕTX. Finally, ρ and w are best characterized by lognormal and truncated GEV distributions, respectively. The best fit model for each parameter was selected using the Akaike information criteria to account to model complexity and avoid overfitting using a tool typically employed in information theory and seldomly used in the context of geotechnical engineering.

The study also provided a quantitative assessment of local engineering practice in which soils in north and south Quito are considered to have different characteristics. To investigate the soil variability between these two zones, a statistical analysis was performed by dividing the database in two sub-samples according to the locations of the original soil samples and field tests. The variability of geotechnical properties in the two zones was assessed using the Welch t-test and Wilcoxon signed rank test to account for parametric and non-parametric assumptions. Results indicated that there is a statistically significant difference between north and south Quito soils in terms of their density, water content, triaxial angle of friction, and shear wave velocity at 30 m. The results agree with Quito’s current seismic soil classification which is based on average shear wave velocity and embedded in the seismic code. The soil classification of the seismic code differentiates between soils in the north and south. The new quantitative study presented here extends the significance of the difference to other stiffness-related parameters, thus providing a more complete picture of the soils in the metropolitan region of Quito.

The final section of this study presented a novel transformation model relating N and Vs30 with an R2 value of 0.76 (for n = 37) with a high percentage of points falling within ± 25% prediction bounds. This transformation model was found to be more suitable for predicting Vs30 values in south Quito because this is where the majority of N and Vs30 data are available. The model provides a new and usable relationship specifically for the metropolitan region of Quito. While the statistical assessment of the geotechnical variability still presents some limitations due to data scarcity, this work provides some conclusive results for practitioners and a valuable indication of where to concentrate further investigations which could add to the robustness of the conclusions discussed in this work.