Abstract
Accurate cropland map serves as the cornerstone of effective agricultural monitoring. Despite the continuous enrichment of remotely sensed cropland maps, pervasive inconsistencies have impeded their further application. This issue is particularly evident in areas with limited valid observations, such as southwestern China, which is characterized by its complex topography and fragmented parcels. In this study, we constructed multi-sourced samples independent of the data producers, taking advantage of open-source validation datasets and sampling to rectify the accuracy of ten contemporary cropland maps in southwestern China, decoded their inconsistencies, and generated a refined cropland map (CroplandSyn) by leveraging ten state-of-the-art remotely sensed cropland maps released from 2021 onwards using the self-adaptive threshold method. Validations, conducted at both prefecture and county scales, underscored the superiority of the refined cropland map, aligning more closely with national land survey data. The refined cropland map and samples are publicly available to users. Our study offers valuable insights for improving agricultural practices and land management in under-monitored areas by providing high-quality cropland maps and validation datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Background & Summary
Croplands feed human beings and sustain life on Earth. Since a billion people are still facing hunger, cropland plays an irreplaceable role in meeting the world’s increasing future food security and sustainability needs1. In addition, cropland may have significant impacts on ecosystems2. For example, the process of agricultural intensification and expansion may encroach on protected areas or forests, leading to the destruction of species’ habitats and extinction3,4 or affects the process of terrestrial carbon cycle5,6. Meanwhile, post-agricultural landscapes, represented by cropland abandonment, continuously impact soil organic carbon sequestration in the context of climate change7. The United Nations’ 2030 Sustainable Development Goals (SDGs) targets therefore call for national cooperation and policies to improve food security (SDG 2), protect ecosystems (SDG 15), and combat climate change (SDG 13)8. In this regard, timely, accurate, and affordable spatiotemporal cropland datasets are the basis for achieving these goals9,10.
Satellite data with spatiotemporally consistent earth observation (EO) have enabled agricultural monitoring from regional to global scales11,12. In the past four decades, multiple research teams have delivered hundreds of approaches to produce land cover datasets worldwide, most of which include the category of cropland13,14,15. The most remarkable advance over time has been the improvement in spatial resolution16. In the 2010s, the spatial resolution of EO-based land use products experienced a great shift from coarse to medium level, reaching 30 m with the freely accessible Landsat archives from the United States Geological Survey (USGS)13,17. Subsequently, with the successive launches of higher spatial resolution satellite sensors (i.e., GaoFen18, Sentinel19 PlanetScope20), as well as the advances in cloud computing data processing techniques such as Google Earth Engine (GEE)21 and the iteration of machine learning algorithms22, it has become possible to conduct more detailed landscape-scale cropland mapping at 10-m or higher spatial resolution levels. Against this background, several research teams have popped up cropland-specific datasets and all-type land use and land cover (LULC) products with 30-m or higher spatial resolution since 202123,24. Nonetheless, there is considerable variation in the portrayal of the amount and spatial extent of cropland among the different datasets, with the spatial resolution sharpening from hundreds of meters to mere tens of meters, as some factors that can be ignored at low resolutions would become the main signals25. Let alone the widely existing inconsistency of criteria, source of data, and classification methods, as well as the lack of independent quantitative assessment of these maps, poses challenges. This is especially true in the context of agricultural monitoring in areas with limited good observations26, high topographic relief, fragmented parcels, and fragile ecological environments like southwestern China, which constrains the in-depth applications of existing datasets27,28,29.
Previous studies have shown that the worldwide substantial inconsistencies lie in three newly published LULC datasets for 2020, including ESA WorldCover, ESRI’s Land Cover, and Google Dynamic World30. In the sub-globe scale, there are also reported cases of consistency assessment of multi-class land cover products31,32,33 or thematic maps34,35 that point to the aforementioned issues. For example, it was found that the percentage of inconsistency between five widely used cropland datasets for Africa is more than 1/336,37. An evaluation from Gao, et al.38 in Europe demonstrated higher consistency and accuracy for cropland and forest categories in three 30-m LULC maps, but lower consistency in the mountainous areas. These studies collectively show that global precision does not necessarily indicate better demonstrate local performance at the regional level39. A similar situation was reported in the forest evaluation for seven global land cover datasets of 201034 and six cropland maps of 202040 in China, as well as the accuracy quantification of six 30-m cropland datasets in circa 201541. These studies have carried out assessment work on administrative scales such as national, provincial, and other scales, based on datasets updated in 2020 or earlier with spatial resolutions equal to or l coarser than 30 m. Since 2020, more than ten newly released and continuously updated datasets with cropland categories provide unprecedented detail of EO at 10-m or higher spatial resolution24,42. However, the consistency and accuracy of these newly published datasets have not been independently assessed and compared, especially in complex terrain and fragmented parcels where cropland mapping has historically been more challenging.
Here, we first constructed a validation dataset by integrating validation samples from publicly available datasets as well as stratified random sampling and generated a consistency map to quantify the spatial pattern of ten existing cropland datasets in southwestern China and then presented a refined dataset that provides an optimized distribution of cropland with a 30-m spatial resolution (CroplandSyn). This study details the production process of these datasets, including accuracy rectification of the existing cropland maps, decoding the inconsistencies, and generating the refined map. Our study provides a clear perspective for understanding the inconsistency of different cropland maps and generated a data-driven refined map to retrieve better the spatial extent and cropland area in southwestern China.
Methods
Study area
The study area is located in southwestern China, with geographic coordinates ranging from 22°29′N to 34°21′N and 97°21′E to 111°47′E (Fig. 1). It encompasses four provincial administrative units: Chongqing, Sichuan, Guizhou, and Yunnan. This region is characterized by rich natural resources and diverse ecosystem types, featuring fragmented parcels and varied topography including plains, basins, hills, and mountains43. The study area is also home to approximately 200 million people and stands as one of China’s most important agricultural production areas. Cropland in this area is primarily located in basins and plains with rainy and cloudy climates, such as the Sichuan Basin and the Yunnan-Guizhou Plateau. Cropland is also cultivated in flat dams and river valleys, with the Sichuan Basin renowned as the “Breadbasket of Tianfu” (Fig. 1). Given the diversity of topography and the intensification of human-land conflicts, the accurate cropland map is of utmost significance in ensuring the sustainable management of cropland in southwestern China44.
Cropland maps
Ten contemporary land cover/use maps for croplands
In this study, we analyzed ten of the latest published cropland maps after 202145,46,47,48,49,50,51,52,53,54 to generate a refined cropland map. These maps are global or local-scale cropland thematic or LULC maps released in recent three years (2021–2023).Table S1 summarizes the general metadata of these maps, which span a range of geographic extent and various spatial resolutions. The accessibility of these cropland maps is demonstrated in Data Records.
Sino-LC1
The Sino-LC1 is China’s first national-scale land cover map with a spatial resolution of nearly 1 m45. It was established by using a low-cost deep learning-based framework and open-access data (including global land-cover (GLC) products, open street map (OSM), and Google Earth imagery). The dataset comprised 11 land cover types, of which cropland was labeled as 5. Due to the problem of updating and coverage of Google Earth’s data in China, the producers of Sino-LC1 used interpolation methods to fill gaps in data.
ESA WorldCover
The 10-m WorldCover product produced by the European Space Agency (ESA) provides free access to the 2020 global land cover map derived from Sentinel-1 and Sentinel-2 satellite data46. The WorldCover product comes with eleven land cover classes, aligned with UN-FAO’s Land Cover Classification System, and independently validated with a global overall accuracy of about 75%.
ESRI Land Cover
The ESRI Land Cover dataset is a global LULC map for 2020 and derived from ESA Sentinel-2 imagery at 10-m resolution47. It is a composite of LULC predictions for ten classes (where croplands are defined as human-planted/plotted cereals, grasses, and crops not at tree height; examples: corn, wheat, soy, fallow plots of structured land) of 2020. The ESRI Land Cover dataset was produced by a deep learning model (uses six bands of Sentinel-2 surface reflectance (SR) data: visible blue, green, red, near-infrared, and two shortwave infrared bands) and was trained using over 5 billion hand-labeled Sentinel-2 pixels, sampled from over 20,000 sites distributed across the world. The dataset achieves an overall accuracy of 86% for global validation.
Dynamic World
Dynamic World is a near real-time 10-m global LULC dataset, produced by deep learning on the Sentinel-2 Level1C remote sensing data from 2015 to the present, freely available through the Google Earth Engine and openly licensed48. It is the result of a partnership between Google and the World Resources Institute to produce a dynamic dataset of the physical material on the surface of the Earth. The dynamic world has three characteristics: near real-time, per-pixel probabilities across nine land cover classes and 10-m resolution. Dynamic World generates more than 5,000 images per day, and by utilizing a novel deep learning methodology based on Sentinel-2 top-of-atmosphere, thus can update global land cover data every 2–5 days (the specific revisit period depends on its position on earth). As the annual cropland map for 2020 was evaluated in this study, a composite method of majority in Earth Engine was performed for the Dynamic World dataset in the data pre-processing to generate the annual composite cropland map.
CRLC
The CRLC is the name of the framework cross-resolution national-scale land-cover49. This study used the CRLC to represent the 10-m resolution land-cover map. It was completed using the CRLC framework based on Sentinel-2 imagery and 30-m historical products (GlobeLand30-2010) and offers the possibility to update products quickly and efficiently globally. The dataset covers eight land-cover types, and the results show that the estimated user accuracy for cropland is 81.72% and the estimated producer accuracy is 81.64% with an estimated area of 1805.1 ± 56.6 103 km2.
GlobeLand 30
GlobeLand 30 was developed by the National Geomatics Centre of China (NGCC) with 30-m spatial resolution, and it provides multi-temporal land cover images at 10-year intervals (2000/2010/2020)50. The data source comes from multi-spectral images, including Landsat TM and ETM+ multispectral images. Globeland 30 contains ten land cover types, and the overall accuracy of this data is 80.30%, with an overall accuracy of 82.39% within China. It was first released for open access and non-commercial utilization in 2014, and the version of 2020 was updated in 2021.
CLCD
The China land cover dataset (CLCD) is a Landsat-derived annual dataset processed on the Google Earth Engine platform. It contains annual land cover in China from 1990 to 2022 at 30-m spatial resolution51. For the processing, several temporal indicators were constructed using 335,709 Landsat images on Google Earth Engine and fed into a random forest classifier to obtain classification results. The overall accuracy of CLCD reached 79.31% based on 5,463 visually interpreted samples by the data producer.
GLC_FCS30
Global land-cover product with fine classification system (GLC_FCS30) version 2020 provides global fine-classified land cover products at 30-m spatial resolution using Landsat time-series imagery52. The GLC_FCS30 provided a time series dataset from 1985 to 2020 with a 5-year interval and utilized continuous Landsat imagery from the Google Earth Engine platform. In particular, GLC_FCS30 2020 is based on the 2015 version product, optimized by combining multi-source auxiliary datasets (e.g., the 2019-2020 Landsat SR data, Sentinel-1 SAR data, DEM terrain elevation data, global thematic auxiliary dataset) and a priori knowledge from experts (e.g., the new GLC_FCS30 2020 product has further improved the classification performance for cropland comparing with its predecessor GLC_FCS30 2015).
GLAD
The GLAD used in this study is the abbreviation for the Global cropland expansion in the 21st-century dataset produced by the Global Land Analysis & Discovery team, which represents a globally consistent cropland extent time-series at 30-m/pixel spatial resolution from 2000 to 201953. Cropland mapping used the consistently processed Landsat satellite data with four-year intervals (2000–2003, 2004–2007, 2008–2011, 2012–2015, and 2016–2019)55. The cropland layer for each epoch is named after the last year of the product period (five maps in total, 2003, 2007, 2011, 2015, and 2019). Using a longer time interval of four years (rather than a single year) increases the available satellite imagery data within the time series. On the other hand, in this dataset, the fallow length is limited to four years for the cropland class (in each four-year interval, mapped an area as cropland if a growing crop was detected during any of these years), which can improve the representativeness of land surface phenology and the accuracy of cropland detection. Due to the lack of the latest version of 2019–2023, the 2019 version was used in the study.
CACD
China’s annual cropland dataset (CACD) is a 30-m annual cropland dataset of China from 1986 to 202154. The dataset utilizes all available Landsat TM/ETM+/OLI Tier 1 SR imagery at a 30-m spatial resolution from 1986 to 2021. Annual cropland in this dataset is defined as a piece of land of 0.09 ha in minimum (minimum width of 30 m) that is sowed/planted and harvestable at least once within the 12 months after the sowing or planting date. The production of the dataset was applied using automated training sample generation, random forest supervised classification, and the LandTrendr temporal segmentation algorithm on the Google Earth Engine platform, enabling cost-effective monitoring of fine-resolution dynamics cropland identification.
Generation of binary cropland maps
According to the definition of cropland in each dataset (Table S1), the binary cropland maps were first extracted from the corresponding LULC maps in GEE and then clipped by the boundaries of southwestern China. All cropland datasets utilized in the study were converted to the Albers Equal Area Conic projected coordinate system (PCS) to facilitate area calculation and comparison. All maps were resampled to 30-m resolution using the nearest method and were batch-conducted using the ArcPy module in a Python environment. Following these pre-processing steps, the binary maps were generated with the pixel value of 1 for the cropland and 0 for the noncropland, respectively (Fig. 2).
Accuracy assessments based on independent sample set
Generation of ground-truth sample independent from map producers
The quality of the reference sample is crucial for an accurate assessment, especially where global accuracy is known to be poorly characterized by local accuracy56. A ground-truth reference sample derived from a multi-sourced sample pool was constructed to rectify the regional accuracy of ten cropland maps in southwestern China. The sample set contains 15,865 ground-truth samples (2,022 for cropland and 13,843 for non-cropland category), with three properties: the latitude and longitude coordinates for geological position and the label of cropland (code: 1) and non-cropland (code: 0) attached to each item (Fig. 3).
Constructing a reference sample set integrates existing samples from public accessible libraries and the additional samples from stratified sampling-aided filed surveys41. The first part comes from the southwestern China subset of the sample pool in the Global Food Security-support Analysis Data (GFSAD)57, Annual Global Land Cover (AGLC)58 and Global Land Cover Estimation (GLanCE)59, with 1,691, 1,296, and 2,387 samples, respectively. These samples were randomly distributed across southwestern China. In the second part, we conducted a stratified random sampling with the strata defined by the proportion of the cropland consistency map. We generated a total of 10,491 sample points within the maximum cropland extent to increase the density of samples. Eventually, 15,865 ground-truth units were composited and distributed across southwestern China (within and outside the potential cropland extent) (Table 1). They then underwent cross-validation using Google Very High Resolution (VHR) images from around 2020, reviewed by two trained senior specialists individually, to ensure that the reference set is stable and representative.
Accuracy metrics
The study implemented the metrics in different scales to demonstrate the accuracy of each provincial unit and the whole of southwestern China. Accuracy assessment of different cropland maps in Southwestern China is a problem of precision assessment in a binary classification scenario. Five metrics commonly used to evaluate the performance of machine learning as well as binary remote sensing classification are used here60: user accuracy (, known as precision), producer accuracy, (known as recall or sensitivity), F1-score, overall accuracy (\({OA}\)), and Matthew’s correlation coefficient (\({MCC}\)). The MCC encompasses true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) and is generally regarded as a balanced indicator. It remains applicable even when there is a significant disparity in the sample sizes of the two categories. A high score is only produced when predictions achieve satisfactory outcomes across all four categories of the confusion matrix (TP, TN, FN, and FP). According to the study of Chicco and Jurman61, the MCC produces a more reliable statistical rate, which makes a high score, especially in the binary classifications and their confusion matrices. We also adjusted the accuracy using the methods proposed by Olofsson, et al.60.
The formulas are shown below:
Where the TP (True Positive) and TN (True Negative) stand for cropland/ non-cropland samples that were correctively mapped; while the FP (False Positive) and FN (False Negative) stand for cropland/ non-cropland samples being incorrectly mapped to the other category, respectively.
Generation of refined cropland map
We harmonized the ten existing cropland maps and generated a refined cropland map through the self-adjusted threshold method (Fig. 4). Specifically, we generated a vote map, where each pixel indicates the frequency with which it is labeled cropland among ten different cropland maps (Fig. 5). Exception for pixels not identified as cropland by any datasets (grey areas on the voting maps), the frequency values have a minimum of 1 and a maximum of 10. We ranked the frequencies from smallest to largest, used a total of 10 numbers from 1 to 10 as thresholds, and then extracted the range of cropland under the corresponding thresholds. For example, a threshold of 1 indicates that only one of the ten datasets is considered to be cropland, in which case the generated cropland map is noted as \({{Map}}_{{threshold\_}01}\). Similarly, a threshold of 10 indicates that all ten datasets are cropland, in this case, the generated cropland map is noted as \({{Map}}_{{threshold\_}10}\). By extracting different thresholds, we generated ten cropland maps (i.e., \({{Map}}_{{threshold\_}01},{{Map}}_{{threshold\_}02},\ldots ,{{Map}}_{{threshold\_}10}\)) that synthesized the consistency of the ten datasets. Furthermore, the cropland area, overall accuracy, and F1 scores of the individual cropland maps derived from refined maps with different thresholds were calculated, and histograms were drawn.
According to the principle that the F1 score is the highest and bias between the mapped area with statistics from TNLS being the lowest, the optimal cropland map generated by the corresponding frequency thresholds (\({{Map}}_{{threshold\_refined}}\)) is determined to be the refined cropland map. The metric performance is shown in Fig. 6.
Cropland area comparison at multiple scales
Area-based comparison
In this study, the cropland area derived from each of the ten cropland datasets was calculated at three administrative levels: the whole of southwestern China, provincial, and district scale, respectively. The administrative boundary file comes from the Resource and Environment Science and Data Canter, Chinese Academy of Sciences (https://www.resdc.cn/). Then, they were compared with the census data from the Third National Land Survey (TNLS) (https://gtdc.mnr.gov.cn). It was worth noting that due to the special administrative organization setup of Chongqing as a municipality directly under the central government (equivalent to a provincial administrative unit), the district-level administrative entities under its jurisdiction (e.g., Wanzhou District of Chongqing) are treated as prefectural-level municipalities to be compared with prefectural-level municipalities in other provinces. Two metrics, including R2 and Root Mean Square Error (RMSE), were characterized to measure the correlation between the mapped area and statistics, and the formulas are listed below.
where the \({y}_{i}\) and \(\bar{y}\) represented the TNLS area and the mean value of TNLS, the \({x}_{i}\) indicated the mapped area, and the \(n\) showed the number of administrative units, respectively.
Spatial extent comparison
The spatial distribution of cropland in southwestern China is heterogeneous, which means that consistency in the overall area does not necessarily lead to spatial consistency. To compare the performance of different datasets in spatial details, the study conducted a pixel-by-pixel comparison and generated a vote map for the ten cropland products. The frequency of pixels labeled cropland was performed and demonstrated as a number of votes from 1 to 10 from bad to good as to consistency. Pixels that were not labeled as cropland by any of the datasets were not considered and labeled as zero.
Furthermore, eight sites distributed in southwestern China (the zoomed-in view are indicated in Fig. 7) were selected in this study to demonstrate the spatial details of the cropland maps. These eight sites covered plains, hills, mountains, river valleys, and other typical landscapes in southwestern China to demonstrate that the refined map better depicts the cropland extent.
Data Records
The refined cropland map generated based on the methods of this article is named CroplandSyn, depending on the optimal threshold presented in Table 2. Specifically, the threshold_05 has the highest F1 Score and is more consistent with statistical data but has a slightly lower OA than the threshold_06, which has a higher OA. The CroplandSyn based on the threshold_05 significantly improved the area and extent of cropland mapping.
The vote map of inconsistency and refined cropland map in GeoTIFF format with Albers conic equal area projected coordinate system at 30-m resolution and their attached pyramid file in.ovr format are available from the figshare repository62. All the raster data can be loaded and edited both in script tools (such as rasterio, gdal, cartopy, etc.) and software supporting.tif format files, such as ESRI ArcGIS (https://www.esri.com/) and QGIS (https://qgis.org/).
The cropland/non-cropland samples of southwestern China in ESRI shapefile format were also shared in the repository. There are five fields in the attribute table; in addition to the latitude and longitude coordinates, the land field has values of 0 and 10, representing non-cropland and cropland samples, respectively. The Source field provides the source of the sample points. The Albers conic equal area projected coordinate system file applied to southwestern China ends with “.prj” format is also uploaded to the repository, which is available for data users in ESRI ArcGIS to reuse without self-definition.
Technical Validation
Two methods were used to validate of the resultant maps, including the sample-aided accuracy assessment and cross-comparison with the existing ten cropland maps.
Accuracy assessments of cropland maps
To quantitatively characterize the accuracies of the ten existing cropland maps and the refined map at multi-administrative scales, we used the previously constructed sample dataset to validate them in Southwestern China and at different provincial administrative scales.
The results showed that the 30-m refined cropland map (in which the threshold_05 was renamed as CroplandSyn in Data Records) based on the thresholds of the vote map ranked among the highest in accuracy (generally with an overall accuracy higher than 0.80) at both the provincial administrative scales and the southwestern China regarding overall accuracy, F1 score, and MCC values (Fig. 8 and Table S2). This is also supported by the sample-based error distribution in Fig. 9. It outperformed most of the ten existing products (especially all five datasets with the same 30-m spatial resolution, where the accuracy of these datasets is even lower than average). It was only surpassed by the WorldCover, which has a higher spatial resolution of 10 m.
Comparisons with existing cropland maps and land survey data
Figure 2 shows the spatial distribution of 12 cropland maps in southwestern China. To avoid the difference in visual effects caused by spatial resolution, the nearest-neighbor method was used to resample cropland maps with spatial resolution of 10- or 1 m to 30 m. Overall, the refined cropland map based on threshold_05 improved both the overestimation in cropland area for the existing 30-m data (mainly due to misclassification errors) and the underestimation of cropland in the 10-m and 1-m data (mainly due to omission errors). Specifically, this is evidenced by the spatial extent comparisons performed on the eight sites illustrated in Fig. 7. For the plain areas (c, i in Fig. 7), the optimized cropland maps provided cropland mapping that was more consistent with the actual distribution. In hilly areas (b,f,h in Fig. 7), the misclassification of the other 30-m cropland maps can be substantially improved.
The mapped area of each cropland map was further compared with the statistics at different administrative levels from the TNLS, which has been recognized as the most precise source of land area data in China63. In general, cropland maps with higher spatial resolution (1-m, 10-m) tended to underestimate the cropland area in Chinese regions compared to the area published by TNLS (red dashed line in Fig. 10). In contrast, the 30-m cropland maps tended to overestimated the area of croplands. There are exceptions, however, with the 10-m CRLC and the 30-m GLAD showing the opposite trends compared to other cropland maps with the same spatial resolution, respectively. The refined maps for thresholds 5 and 6 showed a much smaller gap from the red line in the figure, which is significantly superior to the other nine pre-existing cropland maps except for World Cover, and the differences from World Cover were extremely small. In addition, the area of the 12 cropland maps was compared at the prefecture level (Fig. 11) and county level (Fig. 12), respectively. The scattered points of the refined cropland maps were more centrally distributed along the 1:1 line than the overestimation and underestimation of the existing maps. This suggested that the refined map reduces both the omission error of high spatial resolution maps and the commission error of low spatial resolution maps, approaching the accuracy of the NLSD regarding cropland area.
Uncertainty analysis
Several uncertainties may exist in the dataset. First, for the generation of vote maps, pixels labeled as non-cropland with pixel values equal to zero may have been caused by omission errors in the corresponding cropland product. For example, cropland maps refined using the thresholding method may lose a partition of accuracy due to the omission errors in some of the mountainous areas in Fig. 7(j). It follows that there may be some pixels that are truly cropland not identified as cropland by any of the datasets, further contributing to the omission error in the refined cropland map. In future work, adding training samples in areas of high inconsistency according to the vote map to improve the accuracy in regions with cropland mapping difficulties (i.e., hilly and mountainous regions in Fig. 5) is a possible solution64.
Second, the study performed resampling to harmonize the spatial resolution of the ten available cropland data products, which has to some extent weakened the ability of the 1-m or 10-m spatial resolution datasets to depict cropland details (See in data pre-processing). Although cropland mapping is inherently difficult in an area with such diverse land cover types and fragmented parcels, the refined map produced in this study still offered a more significant improvement in accuracy and for cropland area (Fig. 7 and Fig. 8). This study also demonstrates the feasibility of generating optimized new data from existing data in an era when open-sourced cropland maps are becoming increasingly abundant65,66,67. For further improvements in data fusion, it is possible to generate higher-precision cropland maps by optimizing and integrating datasets based on geographic subdivisions and data-driven alogrithms68,69.
Third, the study has analysed ten recently released cropland maps since 2021. However, only the year 2020 was considered for the comparison. Most of these maps (including World Cover, ESRI Land Cover, Dynamic Word, CLCD, GLC_FCS 30, CACD, and GLAD) can reflect long-term cropland dynamics23. Therefore, the following work could be focused on optimizing time-series cropland maps with a time-series validation sample set to meet the demands of dynamic cropland monitoring better.
Usage Notes
Cropland maps play an indispensable role in guiding agricultural land management. However, in an era of continuous enrichment of open-source data, inconsistencies across datasets hinder our understanding of agricultural land systems’ processes, patterns, and responses to anthropogenic disturbances. Compared to the existing ten cropland datasets, our data-driven refined map (CroplandSyn) shows higher consistency with official land survey data and greater accuracy. Additionally, the shared sample set of this work facilitates quality assessment and continuous refinement of cropland maps in Southwest China.
Code availability
JavaScript codes of the Earth Engine repository used to generate the cropland maps and metrics for accuracy evaluation are shared and available from the figshare repository. Python codes for raster pyramid building are also provided.
The software and modules used in this study include Origin 2023b, ArcGIS Pro 3.1, Python 3.7, gma 1.1.5, and ArcPy 3.1.
The very high spatial resolution (VHR) Google Earth images are accessible by the ArcGIS Web Map Tile Service (WTMS) service.
References
Foley, J. A. et al. Solutions for a cultivated planet. Nature 478, 337–342 (2011).
Foley, J. A. et al. Global Consequences of Land Use. Science 309, 570–574 (2005).
Zabel, F. et al. Global impacts of future cropland expansion and intensification on agricultural markets and biodiversity. Nat. Commun. 10, 2844 (2019).
Meng, Z. et al. Post-2020 biodiversity framework challenged by cropland expansion in protected areas. Nat. Sustainability, 1-11 (2023).
Ge, Q., Dai, J., He, F., Pan, Y. & Wang, M. Land use changes and their relations with carbon cycles over the past 300 a in China. Science in China Series D: Earth Sciences 51, 871–884 (2008).
Liang, X. et al. Exploring cultivated land evolution in mountainous areas of Southwest China, an empirical study of developments since the 1980s. Land Degrad. Dev. 32, 546–558 (2021).
Bell, S. M. et al. Quantifying the recarbonization of post-agricultural landscapes. Nat. Commun. 14, 2139 (2023).
Clark, H. & Wu, H. in Furthering the Work of the United Nations (United Nations, 2016).
Xu, Z. et al. Assessing progress towards sustainable development over space and time. Nature 577, 74–78 (2020).
Pradhan, P. A threefold approach to rescue the 2030 Agenda from failing. Natl. Sci. Rev. 10, nwad015 (2023).
Fritz, S. et al. Mapping global cropland and field size. Global Change Biol. 21, 1980–1992 (2015).
Wu, B. et al. Challenges and opportunities in remote sensing-based crop monitoring: a review. Natl. Sci. Rev. 10, nwac290 (2023).
Hansen, M. C. & Loveland, T. R. A review of large area monitoring of land cover change using Landsat data. Remote Sens. Environ. 122, 66–74 (2012).
Grekousis, G., Mountrakis, G. & Kavouras, M. An overview of 21 global and 43 regional land-cover mapping products. Int. J. Remote Sens. 36, 5309–5335 (2015).
Naboureh, A., Bian, J., Lei, G. & Li, A. A review of land use/land cover change mapping in the China-Central Asia-West Asia economic corridor countries. Big Earth Data 0, 1–21 (2020).
Nabuurs, G.-J. et al. Glasgow forest declaration needs new modes of data ownership. Nat. Clim. Change 12, 415–417 (2022).
Wulder, M. A. et al. Current status of Landsat program, science, and applications. Remote Sens. Environ. 225, 127–147 (2019).
Zhang, D. et al. A generalized approach based on convolutional neural networks for large area cropland mapping at very high resolution. Remote Sens. Environ. 247, 111912 (2020).
Drusch, M. et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 120, 25–36 (2012).
Roy, D. P., Huang, H., Houborg, R. & Martins, V. S. A global analysis of the temporal availability of PlanetScope high spatial resolution multi-spectral imagery. Remote Sens. Environ. 264, 112586 (2021).
Dong, J. et al. State of the art and perspective of agricultural land use remote sensing information extraction. Journal of Geo-Information Science 22, 772–783 (2020).
Yuan, Q. et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 241, 111716 (2020).
Song, X.-P. The future of global land change monitoring. Int. J. Digital Earth 16, 2279–2300 (2023).
Wang, Y. et al. A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS J. Photogramm. Remote Sens. 206, 311–334 (2023).
Chen, J., Liu, Y., Liu, R. & Wei, X. Estimation of High-Resolution Fractional Tree Cover Using Landsat Time-Series Observations. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2023).
Zhou, Y. et al. Are There Sufficient Landsat Observations for Retrospective and Continuous Monitoring of Land Cover Changes in China? Remote Sens. 11, 1808 (2019).
Lei, G. et al. Land Cover Mapping in Southwestern China Using the HC-MMK Approach. Remote Sens. 8, 305 (2016).
Tubiello, F. N. et al. Measuring the world’s cropland area. Nat. Food 4, 30–32 (2023).
Di, Y. et al. Mapping croplands in the granary of the Tibetan Plateau using all available Landsat imagery, a phenology-based approach, and Google Earth Engine. Remote Sens. 13, 2289 (2021).
Venter, Z. S., Barton, D. N., Chakraborty, T., Simensen, T. & Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 14 (2022).
Wickham, J., Stehman, S. V., Sorenson, D. G., Gass, L. & Dewitz, J. A. Thematic accuracy assessment of the NLCD 2016 land cover for the conterminous United States. Remote Sens. Environ. 257, 112357 (2021).
Wang, Z. & Mountrakis, G. Accuracy Assessment of Eleven Medium Resolution Global and Regional Land Cover Land Use Products: A Case Study over the Conterminous United States. Remote Sens. 15, 3186 (2023).
Yang, Y., Xiao, P., Feng, X. & Li, H. Accuracy assessment of seven global land cover datasets over China. ISPRS J. Photogramm. Remote Sens. 125, 156–173 (2017).
Yang, Z. et al. Accuracy Assessment and Inter-Comparison of Eight Medium Resolution Forest Products on the Loess Plateau, China. ISPRS Int. J. Geo-Inf. 6, 152 (2017).
Hou, M. et al. The urgent need to develop a new grassland map in China: based on the consistency and accuracy of ten land cover products. Sci. China Life Sci. 66, 385–405 (2023).
Wei, Y., Lu, M., Wu, W. & Ru, Y. Multiple factors influence the consistency of cropland datasets in Africa. Int. J. Appl. Earth Obs. Geoinf. 89, 102087 (2020).
Nabil, M. et al. Assessing factors impacting the spatial discrepancy of remote sensing based cropland products: A case study in Africa. Int. J. Appl. Earth Obs. Geoinf. 85, 102010 (2020).
Gao, Y. et al. Consistency analysis and accuracy assessment of three global 30-m land-cover products over the European Union using the LUCAS datase. Remote Sens. 12, 3479 (2020).
Liu, L. et al. Finer-Resolution Mapping of Global Land Cover: Recent Developments, Consistency Analysis, and Prospects. J. Remote Sens. 2021 (2021).
Cui, Y. et al. Decoding the inconsistency of six cropland maps in China. The Crop Journal 12, 281–294 (2024).
Zhang, C., Dong, J. & Ge, Q. Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis. Comput. Electron. Agric. 197, 106946 (2022).
Dong, J. et al. Opportunities and challenges in monitoring cultivated land red line in big data era. Bulletin of Chinese Academy of Sciences 38, 1781–1792 (2023).
Qin, X. et al. Identification of Parcel-Scale Crop Types in Southwestern Mountainous Area based on Time Series Remote Sensing Images. Journal of Geo-Information Science 25, 654–668 (2023).
Li, A. et al. The driving factors and buffering mechanism regulating cropland soil acidification across the Sichuan Basin of China. Catena 220 (2023).
Li, Z. et al. SinoLC-1: the first 1-meter resolution national-scale land-cover map of China created with the deep learning framework and open-access data. Earth Syst. Sci. Data Discuss., 1-38 (2023).
Zanaga, D. et al. ESA WorldCover 10 m 2020 v100, Zenodo, https://doi.org/10.5281/zenodo.5571936 (2021).
Karra, K. et al. Global land use/land cover with Sentinel 2 and deep learning. in 2021 IEEE international geoscience and remote sensing symposium IGARSS. 4704-4707 (IEEE).
Brown, C. F. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 9, 251 (2022).
Liu, Y., Zhong, Y., Ma, A., Zhao, J. & Zhang, L. Cross-resolution national-scale land-cover mapping based on noisy label learning: A case study of China. Int. J. Appl. Earth Obs. Geoinf. 118, 103265 (2023).
Chen, J. et al. Global land cover mapping at 30m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 103, 7–27 (2015).
Yang, J. & Huang, X. The 30m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 13, 3907–3925 (2021).
Zhang, X. et al. GLC_FCS30: global land-cover product with fine classification system at 30m using time-series Landsat imagery. Earth Syst. Sci. Data 13, 2753–2776 (2021).
Hansen, M. C. et al. Global land use extent and dispersion within natural land cover using Landsat data. Environ. Res. Lett. 17, 034050 (2022).
Tu, Y. et al. A 30 m annual cropland dataset of China from 1986 to 2021. Earth Syst. Sci. Data Discuss., 1-34 (2023).
Potapov, P. et al. Global maps of cropland extent and change show accelerated cropland expansion in the twenty-first century. Nat. Food 3, 19–28 (2022).
Stehman, S. V. & Foody, G. M. Key issues in rigorous accuracy assessment of land cover products. Remote Sens. Environ. 231, 111199 (2019).
Teluguntla, P. et al. A 30-m Landsat-derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine cloud computing platform. ISPRS J. Photogramm. Remote Sens. 144, 325–340 (2018).
Xu, X., Li, B., Liu, X., Li, X. & Shi, Q. Mapping annual global land cover changes at a 30 m resolution from 2000 to 2015. J. Remote Sens. 25, 1896–1916 (2021).
Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10, 879 (2023).
Olofsson, P. et al. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 148, 42–57 (2014).
Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020).
Cui, Y. et al. Validation and refinement of cropland maps in southwestern China. figshare https://doi.org/10.6084/m9.figshare.25969603.v1 (2024).
Chen, X. et al. Toward sustainable land use in China: A perspective on China’s national land surveys. Land Use Policy 123, 106428 (2022).
You, N. et al. The 10-m crop type maps in Northeast China during 2017–2019. Sci. Data 8, 41 (2021).
Lei, G., Li, A., Bian, J. & Zhang, Z. The roles of criteria, data and classification methods in designing land cover classification systems: evidence from existing land cover data sets. Int. J. Remote Sens. 41, 5062–5082 (2020).
Chaves, M. E. D., Picoli, M. C. A. & Sanches, D. Recent Applications of Landsat 8/OLI and Sentinel-2/MSI for Land Use and Land Cover Mapping: A Systematic Review. Remote Sens. 12, 3062 (2020).
Zhang, C., Dong, J. & Ge, Q. Mapping 20 years of irrigated croplands in China using MODIS and statistics and existing irrigation products. Sci. Data 9, 407 (2022).
Nabil, M., Zhang, M., Wu, B., Bofana, J. & Elnashar, A. Constructing a 30m African cropland layer for 2016 by integrating multiple remote sensing, crowdsourced, and auxiliary datasets. Big Earth Data 6, 54–76 (2022).
Lin, L. et al. Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm. Sci. Data 9, 63 (2022).
Acknowledgements
This work was supported by the Youth Interdisciplinary Team Project of the Chinese Academy of Sciences (JCTD-2021–04), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA28060100), the Informatization Plan of the Chinese Academy of Sciences (Grant No. CAS-WX2021PY-0109), and the National Natural Science Foundation of China (Grant No. 72221002, 42271375). The creation and curation of dataset relied upon different open-sourced land use/ land cover datasets and sample sets. We would like to express our gratitude to the producers of the datasets used in this study for their valuable contributions and to those who assisted us during the fieldwork.
Author information
Authors and Affiliations
Contributions
Yifeng Cui: Conceptualization, methodology, software, programming, validation, visualization, writing – original draft. Jinwei Dong: Conceptualization, methodology, writing – review & editing, supervision, funding acquisition. Chao Zhang: Review & editing, supervision, validation. Jilin Yang: Review & editing, supervision. Na Chen: Review & editing. Peng Guo: Methodology. Yuanyuan Di: Validation, review & editing. Mengxi Chen: review & editing. Aiwen Li: Validation. Ronggao Liu: Conceptualization, supervision, funding acquisition.
Corresponding authors
Ethics declarations
Competing interests
Following the new policies of Springer Nature, we acknowledge that Professor Jinwei Dong is an Editorial Board Member of Scientific Data.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cui, Y., Dong, J., Zhang, C. et al. Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets. Sci Data 11, 671 (2024). https://doi.org/10.1038/s41597-024-03508-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03508-5
- Springer Nature Limited