1 Introduction

In the context of agriculture, soil moisture (SM) is critical. SM refers to the amount of water held within the topsoil, typically the uppermost few centimeters. It significantly impacts plant growth, agricultural practices, water availability, and infiltration rates. By monitoring SM, informed decisions can be made regarding irrigation strategies, crop management, and water conservation efforts (Ali et al. 2015; Kornelsen and Coulibaly 2013; Santi et al. 2016).

SM levels can be determined by two primary data sources: on-site measurements and remote sensing technology, specifically in the microwave range. While on-site data collection provides precise information at specific locations, remote sensing allows for broad area coverage. Traditionally, one limitation of RS-derived SM was its lower temporal resolution compared to in-situ methods. This depended on the specific sensor used (Ghaderizadeh et al. 2022; Kosari et al. 2020; Mohammadi et al. 2021; Tariq et al. 2022; Zamani et al. 2022). However, the emergence of radar constellations has significantly improved the temporal resolution of Synthetic Aperture Radar (SAR) systems, making them a viable option for continuous SM monitoring (Sharifi and Amini 2015; Sharifi et al. 2015).

SM content can be measured accurately and reliably using microwave RS techniques. This is supported by several studies (Ben Abbes et al. 2019; Jarray et al. 2022; Paloscia et al. 2013; Santi et al. 2016). The growing number of active and passive microwave RS satellites, such as RADARSAT-2, SMOS (Soil Moisture and Ocean Salinity), ASCAT (Advanced Scatterometer), AMSR-E (Advanced Microwave Scanning Radiometer on Earth Observing System), TerraSAR-X, ALOS-2, Sentinel-1, and SMAP (Soil Moisture Active and Passive), provides a wealth of SM-related data. However, exploiting this data for SM retrieval presents both opportunities and challenges (Abbes et al. 2019; Jarray et al. 2022).

Microwave remote sensing offers two main methods for retrieving soil moisture data: passive and active (Arab et al. 2024; Wang et al. 2024; Zhu et al. 2024). Passive sensors operate at lower frequencies with high temporal resolution but have a coarse spatial resolution, while active sensors provide detailed mapping but with longer revisit times and sensitivity to other factors (Abbes et al. 2019; Inoubli et al. 2022). Combining data from both approaches can leverage their strengths, aiming to achieve high spatial and temporal resolution for more accurate soil moisture retrieval.

The methods to estimate SM traditionally include ground-based, empirical, physical, data assimilation, and ML techniques. Ground-based methods involve direct measurements but are limited spatially, while empirical methods rely on statistical relationships but may lack accuracy under diverse conditions (Ali et al. 2015). Physical models use microwave-soil interaction principles but demand detailed data. Data assimilation blends simulation and observation, requiring complex frameworks.

Physical models are valuable tools for understanding and estimating SM, but it is important to consider the limitations and potential deviations from reality. Assumptions about soil and vegetation properties may not accurately reflect real-world conditions, introducing errors and reducing reliability. Uncertainties in measuring input parameters and the effects of vegetation on RS signals can impact the accuracy of physical models (Ali et al. 2015; Kolassa et al. 2018; Prasad 2009; Pasolli et al. 2011). Additionally, models that are tailored to specific soil types or conditions may not be applicable in other contexts. Therefore, it is necessary to carefully consider these limitations and choose appropriate models for accurate SM estimation.

The accurate inversion of SM is dependent on knowledge of vegetation and surface roughness, both of which are highly variable. ML has proven to be an effective empirical method for establishing statistical relationships between input data and reference datasets, with recent studies showing its potential for retrieving SM accurately. This approach can help improve the understanding and prediction of soil moisture levels over different terrains and conditions.  (Ali et al. 2015; Kolassa et al. 2018; Pasolli et al. 2011; Prasad 2009).

For example, (Goswami and Kalita 2015) provided a method for SM monitoring using satellite X-band synthetic aperture radar (SAR) data in India during the month of July 2013. Yadav et al. (2022) introduced a radiative transfer theory-based vegetation scattering model using X-bands and C-bands from indigenously designed far-field bistatic specular (bi-spec) scatterometer acquired over the entire rice crop phenology. Yadav et al. (2020) presents an approach to explore new measurement techniques in bistatic direction for crop monitoring and crop vegetation parameter retrieval and to identify the optimum parameter in terms of frequency, polarization and specular angle for vegetation parameter retrieval. Yadav et al. (2018) presented the regression analysis include RF was done between bistatic specular scattering coefficients and crop biophysical parameter at X, C, and L bands for HH and VV polarization at several angle of incidence in order to determine the optimal parameters of a bistatic scatterometer system. Yadav et al. (2022) designed a C-band fully polarimetric bistatic scatterometer (BiSCAT) system to measure the scattering response of vegetated terrain in the forward specular plane. The proposed model measured bistatic scattering coefficient and in-situ soil/vegetation properties. Herold et al. (2001) investigated the problems in SM retrieval using radar RS in agricultural areas. The analysis is based on multi-parameter E-SAR data and intensive field measurements of SM and surface roughness. The results showed the dominance of the roughness signal compared to the variations in SM. Kim et al. (2014) presented the algorithm for retrieving SM within the top 5 cm of the soil using the L-band multi polarized radar data from the future SM Active and Passive (SMAP) mission were applied to the data sets obtained by the recent aircraft field campaign in Canada. The used data were collected over fields with diverse crops and a wide range of moisture and vegetation conditions. The algorithm performance was evaluated. Quast et al. (2023) presented the SM retrieval framework of high resolution (1 km) data from Sentinel-1 C-band Synthetic Aperture Radar (SAR) backscatter measurements using a new bistatic radiative transfer modeling framework (RT1) previously only tested for scatterometer data. The proposed framework is applied over a many set of land cover types across the entire Po-Valley in Italy over a 4-year time-period from 2016 to 2019.

While a significant number of observations are necessary for ML applications, various ML approaches have been used in SM retrieval methodologies based on Artificial Neural Networks, Support Vector Machines, Random Forest, Deep Learning, among others. A review conducted by Ali et al. (2015) focused on ML retrievals of biomass and soil moisture but did not include investigations using recent satellites like SMAP and Sentinel1. Additionally, radar data has been predominantly used for SM retrieval with only limited studies focusing on passive microwave data. Further research is needed to explore the potential of passive microwave data in SM monitoring.

The primary goal of the paper is to review and discuss the advancements in RS techniques for SM estimation, highlighting the potential benefits of combining ML and physical models to enhance accuracy and scalability in large-scale SM retrieval applications. The paper also highlights challenges in integrating these approaches and suggests future research directions for improving the robustness of RS-based SM retrieval methods.

In this paper, the following section presents the literature selection descriptions. Section 2 present the literature selection. Section 3 describes the SM retrieval applications. The SM retrieval methods, scales and evaluation dataset and metrics are shown in Sect. 4. Section 5 discusses the current status and the challenges and future directions. In Sect. 6 we conducted the futures perspectives. Finally, Sect. 6 presents the main conclusions.

2 Literature selection

We introduced the literature search in the key search engines and indexing databases, such as Google Scholar, PubMed, and Scopus. We used the search criteria ’(Soil Moisture OR Machine Learning OR Deep Learning) AND (Soil Moisture ET RS) and (Soil moisture OR Agriculture OR climate change OR hydrology OR food security). We reviewed all the collected papers by title and abstract in order to ensure the content was relevant. The collected articles are mainly published in the journals of IEEE, Springer, Esleaver, etc. and the conference proceedings of IEEE and others (Rank A, B and C).

3 SM applications

This section discusses the main applications related to SM retrieval. The four SM applications including agriculture, food security, hydrology and climate change monitoring receiving the most attention in the literature are reviewed.

3.1 Agriculture

SM retrieval is a valuable tool in the field of agriculture as it helps monitor and manage water resources effectively. By accurately measuring the amount of moisture in the soil, farmers can optimize irrigation practices, ensure proper crop health, and prevent over-watering or under-watering. This information aids in decision-making processes related to planting schedules, fertilization, and overall field management, ultimately leading to better yields and resource conservation (Hamze et al. 2023; Sadri et al. 2020). Table 1 shows the SM retrieval works conducted in the agriculture context.

Table 1 SM retrieval works in the Agriculture context

3.2 Food security

SM retrieval is crucial in the context of food security as it provides important information about the availability of water for crops. By accurately measuring and monitoring SM levels, farmers can make informed decisions regarding irrigation, planting, and fertilization, ultimately improving crop yield and efficiency. This data can help address water scarcity issues, optimize water usage, and ultimately contribute to ensuring global food security (Deléglise et al. 2022; Kazemi Garajeh et al. 2023). Table 2 presents the SM retrieval works conducted in the food security context.

Table 2 SM retrieval works in the food security context

3.3 Hydrology

SM retrieval is a technique used in hydrology to estimate the amount of water present in the soil. It involves using measurements from various sources such as RS satellites or ground-based sensors to determine the moisture content of the soil. This information is crucial for understanding the water cycle, predicting droughts and floods, and managing water resources effectively (Joshi et al. 2023; Zhu et al. 2023). Table 3 presents the SM retrieval works conducted in the hydrology context.

Table 3 SM retrieval works in the hydrology field

3.4 Climate change monitoring

SM retrieval is an important component of climate change monitoring as it provides crucial information about the availability of water in the soil, which directly affects agricultural productivity, hydrological processes, and overall ecosystem health. By accurately measuring and monitoring SM levels, scientists and policymakers can better understand and mitigate the impacts of climate change, such as droughts and floods, and make informed decisions to ensure food security and sustainable water management (Chakraborty et al. 2023; Leonarduzzi et al. 2022). Table 4 presents the SM retrieval works conducted in the climate change monitoring.

Table 4 SM retrieval works in climate change monitoring

4 SM retrieval methods

Several methods used for SM estimation. Figure 1 shows the SM retrieval methods classification. Table 5 presents the SM retrievals works from RS data by several methods.

Fig. 1
figure 1

SM retrieval methods classification

Table 5 SM retrievals works from RS data using proposed methods

4.1 Ground based methods

Traditionally, SM has been observed and analyzed using point ground measurements (e.g Gravimetric, TDR). Moreover, it is very challenging to collect data on the dynamics of the large scale measurements, and there is no robust approach to predicting it. In fact, Gravimetric technique is the most accurate technique for direct measuring SM. This method is the traditional and direct method used to determine SM. However, highly repetitive measurements are not possible. In addition, the sample must be taken out of the soil for laboratory work, which limits the continuous measurement of SM records at any given site.

4.2 Empirical models

Empirical models are widely used in the SM retrieval tasks based on the regression equation between SM and geophysical variables measured by microwave sensors, statistical analysis of empirical data. These models are simple, efficient, and do not require the large labeled input dataset. However, they are limited by the specific environmental and climate conditions.

The statistical relationship between backscattering and SM data was applied as a linear equation in  Loumagne et al. (2001) to estimate SM over wheat fields. In addition, the exponential equation as proposed in  Baghdadi et al. (2006) and the polynomial equation as introduced in  De Roo et al. (2001) in order to SM estimation. According to previous studies  (Chen et al. 1995; Gorrab et al. 2014; Zribi and Dechambre 2003; Zribi et al. 2005), empirical models are related to the analysis of soil characteristics in heterogeneous fields. Imantho et al. (2022) proposed an empirical model to calculate SM and soil work ability based on Sentinel-1A data, which gives spatial-resolution imagery data suitable for accuracy agriculture.

These models are simple, efficient and need minimal input data. Nevertheless, they are limited by dependency on particular environmental conditions and inability to account for physical processes.

One example is the Dubois model, which requires the coefficient of coefficient to be proportional to the product of vegetation water content and soil content. It has been widely used in vegetation-covered areas. The equation is as follows:

$$\begin{aligned} \sigma ^{0}_\text {veg}= A + B\theta V \end{aligned}$$
(1)

where V is the vegetation water content, \(\theta\) is the incidence angle, and A and B are empirical coefficients determined from field data.

4.3 Physical models

Physical models for SM estimation have been found to be effective in retrieving SM by utilizing the interactions between vegetation, ground roughness, and radar signals. Despite their effectiveness, there are certain limitations associated with these models. Quantifying parameters such as emissivity, albedo, and ground roughness at a fine spatial resolution can prove to be challenging. These limitations should be considered when utilizing physical models for soil moisture estimation.

4.3.1 Semi-empirical models

In the study areas covered with vegetation, the Water Cloud Model (WCM) developed by Attema and Ulaby (1978) was widely used to invert the radar signal in order to SM retrieval. Several studies showed that the use of the NDVI as the mainly vegetation information allows for computation of the vegetation effects on the total backscattered coefficients with best accuracy (Baghdadi et al. 2017). The parametrization of the WCM was gived in many studies for several SAR configurations and crop types (Baghdadi et al. 2017; Gherboudj et al. 2011).

In fact, the total backscatter signal (\(\sigma ^{0}_\text {\tiny tot}\)) received by radar can be expressed as:

$$\begin{aligned} \sigma ^{0}_\text {tot}= & {} \sigma ^{0}_\text {veg} + \tau ^{2} \sigma ^{0}_\text {sol} \end{aligned}$$
(2)
$$\begin{aligned} \tau ^{2}= & {} \exp (-2B \times VWC \sec \theta ) \end{aligned}$$
(3)

The scattering due to the vegetation (\(\sigma ^{0}_\text {\tiny veg}\))

$$\begin{aligned} \sigma ^{0}_\text {veg}= A\times VWC \cos \theta (1-\tau ^{2}) \end{aligned}$$
(4)

The direct soil backscatter \((\sigma ^{0}_\text {\tiny sol})\)

$$\begin{aligned} \sigma ^{0}_\text {sol}= C\times 10^{D\times M} \end{aligned}$$
(5)

. Where, \(\tau ^{2}\) defines the two-way attenuation through the vegetation, \(\theta\) is the incidence angle. A, B, C, and D are the parameters of the model. M is the SM.

4.3.2 Inversion methods

Over bare soils (arid or semi arid regions), the SM retrieval was predicted by inverting the SAR signal using physical models. The Integral Equation Model (IEM) (Fung 1994) is the mainly used physical model to invert the radar signal and estimate the SM. The IEM developed by Fung (1994) shows a large difference between simulated and observed SAR data (Baghdadi et al. 2002; Zribi et al. 1997) which leads to inaccurate SM retrievals. To refine the accuracy of simulated backscattering values from IEM, (Baghdadi et al. 2006, 2011, 2006, 2015) introduced a semi-empirical calibration of the IEM.

It describes the relationship between the backscattering coefficient by the characteristics of the sensor (incidence angle, polarization and radar wavelength) and the soil properties (dielectric constant, Hrms, correlation length, and correlation function) in bare soils for agriculturally bare fields. It can be expressed as follows:

$$\begin{aligned} \begin{array}{ccc} \sigma ^{0}_\text {pp}= \frac{k ^{2}}{2} |f_\text {pp} |^{2} \exp (-4 \times Hrms ^{2} k^{2} \cos ^{2}\theta ) \sum _{i=1}^{+\infty } \frac{(4 \times Hrms ^{2} k^{2} \cos ^{2}\theta ) ^{n}}{ n!} W ^{(n)} (2k \sin \theta , 0) \\ \\ + \frac{k ^{2}}{2} Re (f_\text {pp} * F_\text {pp}) \exp (-3 \times Hrms ^{2} k^{2} \cos ^{2}\theta ) \sum _{i=1}^{+\infty } \frac{(4 \times Hrms ^{2} \cos ^{2}\theta ) ^{n}}{ n!} W ^{(n)} (2k \sin \theta , 0) \\ \\ + \frac{k ^{2}}{8} |F_\text {pp} |^{2} \exp (-2 \times Hrms ^{2} k^{2} \cos ^{2}\theta ) \sum _{i=1}^{+\infty } \frac{( Hrms ^{2} k^{2} \cos ^{2}\theta ) ^{n}}{ n!} W ^{(n)} (2k \sin \theta , 0) \\ \end{array} \end{aligned}$$
(6)

The limitation of this models are that they cannot be generalized to other sites, as they depend depending of all scenarios of soil (SM, soil roughness...) and instrumental parameters (incidence angle).

4.4 Data assimilation techniques

Data assimilation is a statistical method combining mathematical models with observations data to refine the accuracy of the estimation tasks. The latter method has been widely used in many RS applications such as SM retrieval.

4.4.1 Kalman filter

Kalman filter (KF) is an applied Data assimilation method for estimation tasks from a series of noisy observations. Based on a mathematical model, it describes the evolution of the state over time and an observation model. The KF algorithm is composed of two steps: prediction and update. First, in the prediction step, the KF uses the mathematical model to estimate the value of the next time. Second, in the update step, the KF used the new observation data to refine the results. The KF has been applied for SM retrieval from RS data (De Rosnay et al. 2013; Reichle et al. 2008; De Lannoy et al. 2007).

4.4.2 Bayesian approaches

Bayesian methods use Bayes’ theorem to update the probability distribution based on the observation data. Bayesian approaches can be used to estimate tasks. The most applied Bayesian approach is the Markov chain Monte Carlo method. It has been used for SM retrieval from RS data (Notarnicola et al. 2008; Notarnicola 2013; Paloscia et al. 2008). It has shown promising results for SM retrieval, especially in the presence of uncertainties and errors.

4.5 Learning models

ML and DL models are widely used in SM retrieval tasks.

4.5.1 ML models

ML techniques have emerged as a best tool to SM retrieval tasks at high spatial and temporal resolution (Jarray et al. 2021). For example (Santi et al. 2016) introduced the ANN-based approach to estimate daily SM at 10 km spatial resolution. They utilized backscatter, local incidence angle, azimuth angle, Latitude, Longitude information from Advanced Scatterometer (ASCAT), and SM validation data from International SM Network (ISMN) as input to train the ANN model. The obtained results show that ANN performs well on the testing data sets with \(\hbox {R} = 0.82\) and \(\hbox {RMSE} = 0.04\,\hbox {m}^{3}/\hbox {m}^{3}\). Jarray et al. (2021) proposed three ML models in order to estimate SM in arid regions. They used the VV, VH and NDVI as input data to three used models. The obtained results show that RF is the best model with \(\hbox {R} = 0.86\).

The most commonly used ML techniques for SM retrieval is the ANN, XGboost and RF. For example, (Jarray et al. 2021, 2022) used these ML techniques in arid regions in order SM retrieval. Jarray et al. (2022) developed the Teacher Student framework to transfer the knowledge of the teacher ML model to the student DL model by adding additional information in order to refine the estimation. The RFR is parameterized as max_depth = 30, min_samples_split = 5, n_estimators = 100. Three hidden layers was used with 32, 32, and 16 neurons for the ANN. In XGBoost, the hyper parameters max_depth = 30, learning rate = 10 \(^{-1}\) and n_estimators = 100 were utilized. For instance, These ML-based data is generated purely from observational information and therefore independent from the other methods, although data quality outside the training condition of model is potentially uncertain.

4.5.2 DL models

DL techniques are widely used in the SM retrieval (Jarray et al. 2021, 2022; Ben Abbes and Jarray 2023). For instance, (Singh and Gaurav 2023) proposed a new architecture based on a fully connected feed-forward Artificial Neural Network model to SM retrieval from satellite images on a large alluvial fan of the Kosi River in the Himalayan Foreland. They used several features extracted from Sentinel1-A and Sentinel-A images including (VV, VH, NDVI and IA). They proposed a comparative study by others DL models. The obtained results show that the ANN model accurately predicts the SM and outperforms all the benchmark algorithms with correlation coefficient (\(\hbox {R} = 0.80\)), Root Mean Square Error (\(\hbox {RMSE} = 0.040\,\hbox {m}^{3}/\hbox {m}^{3}\)), and \(\hbox {bias} = 0.004\hbox {m}^{3}/\hbox {m}^{3}\). Besides, (Habiboullah and Louly 2023) used LSTM and combined the Sentinel and MODIS data to SM retrieval. In addition, CNN has achieved great success in SM estimation accuracy (\(\hbox {R2} =0.8664\)) over agricultural areas from Sentinel-1 images (Hegazi et al. 2021). Hegazi et al. (2023) applied CNN model using Sentinel-2 data (NDVI and NDWI indexes).

The CNN used by Jarray et al. (2022) was implemented using the initial learning rate for the Adam algorithm was set to \(10^{-4}\) and the size of the batch was equal to 64. The Adam algorithm were applied to CNN optimization.. It used the Re LU activation, dropout of probability 0.5, and training for 100 epochs. The introduced CNN was used on the validation set to tune hyper parameters. It consists of five convolutional blocks with ReLU activation, at the end of the Max Pooling at the end of each convolutional block. The number of filters is 64,128, 256, 512, and 512, respectively. Eventually, The linear activation function was applied. These techniques give promising results but they require a large set of training data to give better results and avoid the overfitting.

4.6 SM retrieval scales

Scales of SM variations are important for understanding patterns of climate change, for developing and evaluating land surface models, for agriculture and food security. In order to study the observed temporal and spatial scales of SM variation, we classify many works in terms of temporal and spatial scales. Figure 2 presents the projection of three popular scales SM estimation at time and space.

Fig. 2
figure 2

Different scales of SM retrieval

Table 6 presents a classification of the works at three scales (local, regional and global).

Table 6 SM retrieval works in climate change monitoring

4.7 Evaluation datasets and metrics

4.7.1 Evaluation datasets

Several spatial and temporal coverages and resolution of several representative and publicly available SM data. Meanwhile, there remains a lack of long-term global SM data at combined high spatial and temporal resolutions. Although the SMAP/Sentinel-1 has global coverage and a spatial resolution up to 1 km, its temporal resolution is within 12 days of the relatively long revisit cycle of Sentinel-1 SAR satellites. Other downscaled high-resolution SM data maintain regional or continental coverage, limited by the lack of validation data.

4.7.2 The pre-processing steps

Pre-processing steps play a crucial role in preparing RS data for SM estimation. These steps, such as radiometric calibration, atmospheric correction, spatial resampling, and upscaling/downscaling, need to be selected and customized based on the study area, sensors, and the specific objective of the estimation. By carefully implementing these pre-processing steps, RS data can be refined to provide accurate and reliable SM estimates.

4.7.3 Metrics

Selecting a sensible metric of SM retrieval accuracy for a given application requires a solid understanding of these domains. It needs the relative pros and cons of available metrics, and the connections between them. For example, the root-mean-square error (RMSE) and the time series correlation (r), unbiased RMSE and Bias are the most used metrics.

5 Discussion

5.1 Heterogeneity of used parameters

The impact of biophysical parameters and vegetation on SAR-based SM retrieval is a challenging issue, especially in various climates such as arid, semi-arid, and humid regions. The vegetation cover and soil parameters can affect the backscatter signals, which in turn affect the accuracy of SM estimation. Different methods like the Integral Equation Model (IEM) and the Advanced Integral Equation Model (AIEM) have been proposed to account for these factors, but further research is needed to improve the accuracy of SAR-based SM retrieval in different climatic conditions.

5.2 Data availability

The challenge of SM retrieval lies in the availability and accessibility of data. RS data requires extensive processing and calibration, which necessitates expertise in the field. Moreover, the acquisition of RS data is restricted by factors such as weather conditions and satellite availability. Additionally, the accuracy and reliability of SM retrieval techniques heavily rely on the type of sensor used, the spatial and temporal resolution of the data, and the validation of the results through ground measurements. Spatial and temporal resolution significantly impact the accuracy of SM retrieval. Higher spatial resolution provides more detailed information about the SM distribution within a particular area, allowing for better characterization of heterogeneity. Similarly, higher temporal resolution enables better tracking of changes in SM over time, capturing short-term variations and understanding seasonal patterns. Improved accuracy in SM retrieval is essential for various applications, including agriculture, hydrology, and climate modeling, as it aids in better decision-making and resource management.

5.3 ML and DL complexity

Generally, the best ML/DL model is selected by comparing to the other existing models. However, the ML models ( e.g. ANN, Decision Trees, Random Forests, etc.) are used by the less time consuming and low overcomplicated designs. Nevertheless, DL such as CNN and LSTM is difficult to implement and commonly used for the best background tasks that require the analysis of a large amount of RS acquired data. Another important challenge is the selection of the hyperparameters used in the training. Selecting the optimal hyperparameters for ML models in SM retrieval is crucial for accurate and reliable estimation. These hyperparameters control the behavior of the model during training and can significantly impact its performance. However, finding the best set of hyperparameters is a complex task that requires careful tuning and experimentation. Various techniques, such as grid search or random search, can guid in this process by systematically exploring the hyperparameter space. Additionally, cross-validation can be used to assess the performance of different hyperparameter configurations and prevent overfitting. Overall, selecting the right hyperparameters is a challenging yet essential step in achieving robust and accurate SM retrieval using machine learning techniques.

5.4 Public dataset for benchmarking

Collecting public benchmarking datasets for different SM applications is crucial for advancing research in agriculture, food security, drought, and climate change monitoring. These datasets allow researchers and stakeholders to compare and evaluate different models, algorithms, and techniques, leading to improvements in SM monitoring and management practices. Moreover, public benchmarking datasets foster collaboration and knowledge sharing among the scientific community, enabling the development of innovative solutions for addressing pressing environmental challenges. This study enables a fair comparison of different methods on their efficiency, robustness, and accuracy. Here are some ideas on the Design and build of a public calibration data set. First, the dataset should include all aspects of the data distribution, spatial and temporal resolution, including difficult cases, to make the dataset representative and to contain all possible examples, based on the knowledge of SM experts. Second, the dataset should include as wide a variety of sensors as possible when obtaining the images, including the distribution of equipment such as SM station and end-user preferences. Third, the dataset should cover different regions of the world at different spatial and temporal resolution, as different countries have different image appearances due to different equipment, soil quality and climate.

5.5 Integration of ML and physical models

The integration of physical modeling and machine learning techniques in estimating soil moisture presents an exciting opportunity to leverage the strengths of both methods. By combining these approaches, the accuracy and efficiency of soil moisture estimation processes can be significantly enhanced, especially in areas where the data input is uncertain. This hybrid model offers benefits such as better generalization, handling complex relationships, integrating multiple data sources, and reducing model uncertainties, ultimately improving the accuracy and reliability of soil moisture estimation.

5.6 Analysis

RS has greatly contributed to the field of SM retrieval, providing valuable information for various applications. Techniques such as active and passive microwave sensing, thermal infrared RS, and optical RS are commonly used to estimate SM at different spatial and temporal scales. However, challenges still exist in accurately retrieving SM, including uncertainties in calibration/validation, integration of multiple sensors, and scaling issues. Despite these challenges, RS offers a promising solution for monitoring and managing SM, with diverse applications in agriculture, food security, hydrology, and climate modeling.

This paper identifies the recent research developments in SM retrieval using heterogeneous and environmental data. SM estimation can be analyzed using ML and DL techniques. From the present review, we may conclude:

  • The need for a robust framework to retrieve SM by combining different parameters under several climatic conditions (arid, semi-arid and humid).

  • ML techniques were efficiently used to retrieve SM with varying levels of computational capabilities.

  • DL models are also refined the accuracy of results compared to ML models

6 Conclusion and perspectives

The paper presents a thorough review of the advancements in RS technology for retrieving SM levels. It offers researchers in the field a thorough and exhaustive source of information. understanding of the various applications of SM retrieval, the techniques used, the scales at which these retrievals are conducted, and the challenges faced. This paper summarizes a range of published applications that have been published, and it focuses on four different SM retrieval methods: empirical models, physical models, data assimilation techniques, and learning models. By examining these methods, the paper aims to provide an understanding of the different approaches and their effectiveness in SM retrieval. The discussion of new research trends in this area indicates that the field of SM retrieval is constantly evolving.The progress and innovations in ML techniques clearly indicate that ML will continue to be a prominent and attractive approach within this field. Nevertheless, there are still obstacles that need to be addressed in the future. These challenges include the computational analysis of large datasets, the interpretability and explainability of ML models, and the evaluation of their performance in accurately estimating SM levels. Additional research and development are necessary to overcome these obstacles and further improve the application of ML in SM retrieval. The integration of emerging technologies like quantum computing in SM estimation could potentially revolutionize the field by offering more accurate and efficient methods of data analysis. Quantum computing’s ability to process vast amounts of data simultaneously and perform complex calculations could enhance the accuracy and speed of SM estimation models, leading to more effective and timely decision-making for agricultural purposes, water resource management, and environmental sustainability