Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets

Cui, Yifeng; Dong, Jinwei; Zhang, Chao; Yang, Jilin; Chen, Na; Guo, Peng; Di, Yuanyuan; Chen, Mengxi; Li, Aiwen; Liu, Ronggao

doi:10.1038/s41597-024-03508-5

Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets

Data Descriptor
Open access
Published: 22 June 2024

Volume 11, article number 671, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets

Download PDF

Yifeng Cui^1,2,3,
Jinwei Dong ORCID: orcid.org/0000-0001-5687-803X^1,2,
Chao Zhang⁴,
Jilin Yang⁵,
Na Chen³,
Peng Guo⁶,
Yuanyuan Di^1,7,
Mengxi Chen⁶,
Aiwen Li⁸ &
…
Ronggao Liu¹

763 Accesses
Explore all metrics

Abstract

Accurate cropland map serves as the cornerstone of effective agricultural monitoring. Despite the continuous enrichment of remotely sensed cropland maps, pervasive inconsistencies have impeded their further application. This issue is particularly evident in areas with limited valid observations, such as southwestern China, which is characterized by its complex topography and fragmented parcels. In this study, we constructed multi-sourced samples independent of the data producers, taking advantage of open-source validation datasets and sampling to rectify the accuracy of ten contemporary cropland maps in southwestern China, decoded their inconsistencies, and generated a refined cropland map (Cropland_Syn) by leveraging ten state-of-the-art remotely sensed cropland maps released from 2021 onwards using the self-adaptive threshold method. Validations, conducted at both prefecture and county scales, underscored the superiority of the refined cropland map, aligning more closely with national land survey data. The refined cropland map and samples are publicly available to users. Our study offers valuable insights for improving agricultural practices and land management in under-monitored areas by providing high-quality cropland maps and validation datasets.

Using a global reference sample set and a cropland map for area estimation in China

Article 05 December 2016

A comparative analysis of five global cropland datasets in China

Article 22 July 2016

A crop type dataset for consistent land cover classification in Central Asia

Article Open access 28 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Background & Summary

Croplands feed human beings and sustain life on Earth. Since a billion people are still facing hunger, cropland plays an irreplaceable role in meeting the world’s increasing future food security and sustainability needs¹. In addition, cropland may have significant impacts on ecosystems². For example, the process of agricultural intensification and expansion may encroach on protected areas or forests, leading to the destruction of species’ habitats and extinction^3,4 or affects the process of terrestrial carbon cycle^5,6. Meanwhile, post-agricultural landscapes, represented by cropland abandonment, continuously impact soil organic carbon sequestration in the context of climate change⁷. The United Nations’ 2030 Sustainable Development Goals (SDGs) targets therefore call for national cooperation and policies to improve food security (SDG 2), protect ecosystems (SDG 15), and combat climate change (SDG 13)⁸. In this regard, timely, accurate, and affordable spatiotemporal cropland datasets are the basis for achieving these goals^9,10.

Satellite data with spatiotemporally consistent earth observation (EO) have enabled agricultural monitoring from regional to global scales^11,12. In the past four decades, multiple research teams have delivered hundreds of approaches to produce land cover datasets worldwide, most of which include the category of cropland^13,14,15. The most remarkable advance over time has been the improvement in spatial resolution¹⁶. In the 2010s, the spatial resolution of EO-based land use products experienced a great shift from coarse to medium level, reaching 30 m with the freely accessible Landsat archives from the United States Geological Survey (USGS)^13,17. Subsequently, with the successive launches of higher spatial resolution satellite sensors (i.e., GaoFen¹⁸, Sentinel¹⁹ PlanetScope²⁰), as well as the advances in cloud computing data processing techniques such as Google Earth Engine (GEE)²¹ and the iteration of machine learning algorithms²², it has become possible to conduct more detailed landscape-scale cropland mapping at 10-m or higher spatial resolution levels. Against this background, several research teams have popped up cropland-specific datasets and all-type land use and land cover (LULC) products with 30-m or higher spatial resolution since 2021^23,24. Nonetheless, there is considerable variation in the portrayal of the amount and spatial extent of cropland among the different datasets, with the spatial resolution sharpening from hundreds of meters to mere tens of meters, as some factors that can be ignored at low resolutions would become the main signals²⁵. Let alone the widely existing inconsistency of criteria, source of data, and classification methods, as well as the lack of independent quantitative assessment of these maps, poses challenges. This is especially true in the context of agricultural monitoring in areas with limited good observations²⁶, high topographic relief, fragmented parcels, and fragile ecological environments like southwestern China, which constrains the in-depth applications of existing datasets^27,28,29.

Previous studies have shown that the worldwide substantial inconsistencies lie in three newly published LULC datasets for 2020, including ESA WorldCover, ESRI’s Land Cover, and Google Dynamic World³⁰. In the sub-globe scale, there are also reported cases of consistency assessment of multi-class land cover products^31,32,33 or thematic maps^34,35 that point to the aforementioned issues. For example, it was found that the percentage of inconsistency between five widely used cropland datasets for Africa is more than 1/3^36,37. An evaluation from Gao, et al.³⁸ in Europe demonstrated higher consistency and accuracy for cropland and forest categories in three 30-m LULC maps, but lower consistency in the mountainous areas. These studies collectively show that global precision does not necessarily indicate better demonstrate local performance at the regional level³⁹. A similar situation was reported in the forest evaluation for seven global land cover datasets of 2010³⁴ and six cropland maps of 2020⁴⁰ in China, as well as the accuracy quantification of six 30-m cropland datasets in circa 2015⁴¹. These studies have carried out assessment work on administrative scales such as national, provincial, and other scales, based on datasets updated in 2020 or earlier with spatial resolutions equal to or l coarser than 30 m. Since 2020, more than ten newly released and continuously updated datasets with cropland categories provide unprecedented detail of EO at 10-m or higher spatial resolution^24,42. However, the consistency and accuracy of these newly published datasets have not been independently assessed and compared, especially in complex terrain and fragmented parcels where cropland mapping has historically been more challenging.

Here, we first constructed a validation dataset by integrating validation samples from publicly available datasets as well as stratified random sampling and generated a consistency map to quantify the spatial pattern of ten existing cropland datasets in southwestern China and then presented a refined dataset that provides an optimized distribution of cropland with a 30-m spatial resolution (Cropland_Syn). This study details the production process of these datasets, including accuracy rectification of the existing cropland maps, decoding the inconsistencies, and generating the refined map. Our study provides a clear perspective for understanding the inconsistency of different cropland maps and generated a data-driven refined map to retrieve better the spatial extent and cropland area in southwestern China.

Methods

Study area

The study area is located in southwestern China, with geographic coordinates ranging from 22°29′N to 34°21′N and 97°21′E to 111°47′E (Fig. 1). It encompasses four provincial administrative units: Chongqing, Sichuan, Guizhou, and Yunnan. This region is characterized by rich natural resources and diverse ecosystem types, featuring fragmented parcels and varied topography including plains, basins, hills, and mountains⁴³. The study area is also home to approximately 200 million people and stands as one of China’s most important agricultural production areas. Cropland in this area is primarily located in basins and plains with rainy and cloudy climates, such as the Sichuan Basin and the Yunnan-Guizhou Plateau. Cropland is also cultivated in flat dams and river valleys, with the Sichuan Basin renowned as the “Breadbasket of Tianfu” (Fig. 1). Given the diversity of topography and the intensification of human-land conflicts, the accurate cropland map is of utmost significance in ensuring the sustainable management of cropland in southwestern China⁴⁴.

Cropland maps

Ten contemporary land cover/use maps for croplands

In this study, we analyzed ten of the latest published cropland maps after 2021^{45,46,47,48,49,50,51,52,53,54} to generate a refined cropland map. These maps are global or local-scale cropland thematic or LULC maps released in recent three years (2021–2023).Table S1 summarizes the general metadata of these maps, which span a range of geographic extent and various spatial resolutions. The accessibility of these cropland maps is demonstrated in Data Records.

Sino-LC1

The Sino-LC1 is China’s first national-scale land cover map with a spatial resolution of nearly 1 m⁴⁵. It was established by using a low-cost deep learning-based framework and open-access data (including global land-cover (GLC) products, open street map (OSM), and Google Earth imagery). The dataset comprised 11 land cover types, of which cropland was labeled as 5. Due to the problem of updating and coverage of Google Earth’s data in China, the producers of Sino-LC1 used interpolation methods to fill gaps in data.

ESA WorldCover

The 10-m WorldCover product produced by the European Space Agency (ESA) provides free access to the 2020 global land cover map derived from Sentinel-1 and Sentinel-2 satellite data⁴⁶. The WorldCover product comes with eleven land cover classes, aligned with UN-FAO’s Land Cover Classification System, and independently validated with a global overall accuracy of about 75%.

ESRI Land Cover

The ESRI Land Cover dataset is a global LULC map for 2020 and derived from ESA Sentinel-2 imagery at 10-m resolution⁴⁷. It is a composite of LULC predictions for ten classes (where croplands are defined as human-planted/plotted cereals, grasses, and crops not at tree height; examples: corn, wheat, soy, fallow plots of structured land) of 2020. The ESRI Land Cover dataset was produced by a deep learning model (uses six bands of Sentinel-2 surface reflectance (SR) data: visible blue, green, red, near-infrared, and two shortwave infrared bands) and was trained using over 5 billion hand-labeled Sentinel-2 pixels, sampled from over 20,000 sites distributed across the world. The dataset achieves an overall accuracy of 86% for global validation.

Dynamic World

Dynamic World is a near real-time 10-m global LULC dataset, produced by deep learning on the Sentinel-2 Level1C remote sensing data from 2015 to the present, freely available through the Google Earth Engine and openly licensed⁴⁸. It is the result of a partnership between Google and the World Resources Institute to produce a dynamic dataset of the physical material on the surface of the Earth. The dynamic world has three characteristics: near real-time, per-pixel probabilities across nine land cover classes and 10-m resolution. Dynamic World generates more than 5,000 images per day, and by utilizing a novel deep learning methodology based on Sentinel-2 top-of-atmosphere, thus can update global land cover data every 2–5 days (the specific revisit period depends on its position on earth). As the annual cropland map for 2020 was evaluated in this study, a composite method of majority in Earth Engine was performed for the Dynamic World dataset in the data pre-processing to generate the annual composite cropland map.

CRLC

The CRLC is the name of the framework cross-resolution national-scale land-cover⁴⁹. This study used the CRLC to represent the 10-m resolution land-cover map. It was completed using the CRLC framework based on Sentinel-2 imagery and 30-m historical products (GlobeLand30-2010) and offers the possibility to update products quickly and efficiently globally. The dataset covers eight land-cover types, and the results show that the estimated user accuracy for cropland is 81.72% and the estimated producer accuracy is 81.64% with an estimated area of 1805.1 ± 56.6 10³ km2.

GlobeLand 30

GlobeLand 30 was developed by the National Geomatics Centre of China (NGCC) with 30-m spatial resolution, and it provides multi-temporal land cover images at 10-year intervals (2000/2010/2020)⁵⁰. The data source comes from multi-spectral images, including Landsat TM and ETM+ multispectral images. Globeland 30 contains ten land cover types, and the overall accuracy of this data is 80.30%, with an overall accuracy of 82.39% within China. It was first released for open access and non-commercial utilization in 2014, and the version of 2020 was updated in 2021.

CLCD

The China land cover dataset (CLCD) is a Landsat-derived annual dataset processed on the Google Earth Engine platform. It contains annual land cover in China from 1990 to 2022 at 30-m spatial resolution⁵¹. For the processing, several temporal indicators were constructed using 335,709 Landsat images on Google Earth Engine and fed into a random forest classifier to obtain classification results. The overall accuracy of CLCD reached 79.31% based on 5,463 visually interpreted samples by the data producer.

GLC_FCS30

Global land-cover product with fine classification system (GLC_FCS30) version 2020 provides global fine-classified land cover products at 30-m spatial resolution using Landsat time-series imagery⁵². The GLC_FCS30 provided a time series dataset from 1985 to 2020 with a 5-year interval and utilized continuous Landsat imagery from the Google Earth Engine platform. In particular, GLC_FCS30 2020 is based on the 2015 version product, optimized by combining multi-source auxiliary datasets (e.g., the 2019-2020 Landsat SR data, Sentinel-1 SAR data, DEM terrain elevation data, global thematic auxiliary dataset) and a priori knowledge from experts (e.g., the new GLC_FCS30 2020 product has further improved the classification performance for cropland comparing with its predecessor GLC_FCS30 2015).

GLAD

The GLAD used in this study is the abbreviation for the Global cropland expansion in the 21^st-century dataset produced by the Global Land Analysis & Discovery team, which represents a globally consistent cropland extent time-series at 30-m/pixel spatial resolution from 2000 to 2019⁵³. Cropland mapping used the consistently processed Landsat satellite data with four-year intervals (2000–2003, 2004–2007, 2008–2011, 2012–2015, and 2016–2019)⁵⁵. The cropland layer for each epoch is named after the last year of the product period (five maps in total, 2003, 2007, 2011, 2015, and 2019). Using a longer time interval of four years (rather than a single year) increases the available satellite imagery data within the time series. On the other hand, in this dataset, the fallow length is limited to four years for the cropland class (in each four-year interval, mapped an area as cropland if a growing crop was detected during any of these years), which can improve the representativeness of land surface phenology and the accuracy of cropland detection. Due to the lack of the latest version of 2019–2023, the 2019 version was used in the study.

CACD

China’s annual cropland dataset (CACD) is a 30-m annual cropland dataset of China from 1986 to 2021⁵⁴. The dataset utilizes all available Landsat TM/ETM+/OLI Tier 1 SR imagery at a 30-m spatial resolution from 1986 to 2021. Annual cropland in this dataset is defined as a piece of land of 0.09 ha in minimum (minimum width of 30 m) that is sowed/planted and harvestable at least once within the 12 months after the sowing or planting date. The production of the dataset was applied using automated training sample generation, random forest supervised classification, and the LandTrendr temporal segmentation algorithm on the Google Earth Engine platform, enabling cost-effective monitoring of fine-resolution dynamics cropland identification.

Generation of binary cropland maps

According to the definition of cropland in each dataset (Table S1), the binary cropland maps were first extracted from the corresponding LULC maps in GEE and then clipped by the boundaries of southwestern China. All cropland datasets utilized in the study were converted to the Albers Equal Area Conic projected coordinate system (PCS) to facilitate area calculation and comparison. All maps were resampled to 30-m resolution using the nearest method and were batch-conducted using the ArcPy module in a Python environment. Following these pre-processing steps, the binary maps were generated with the pixel value of 1 for the cropland and 0 for the noncropland, respectively (Fig. 2).

Accuracy assessments based on independent sample set

Generation of ground-truth sample independent from map producers

The quality of the reference sample is crucial for an accurate assessment, especially where global accuracy is known to be poorly characterized by local accuracy⁵⁶. A ground-truth reference sample derived from a multi-sourced sample pool was constructed to rectify the regional accuracy of ten cropland maps in southwestern China. The sample set contains 15,865 ground-truth samples (2,022 for cropland and 13,843 for non-cropland category), with three properties: the latitude and longitude coordinates for geological position and the label of cropland (code: 1) and non-cropland (code: 0) attached to each item (Fig. 3).

Constructing a reference sample set integrates existing samples from public accessible libraries and the additional samples from stratified sampling-aided filed surveys⁴¹. The first part comes from the southwestern China subset of the sample pool in the Global Food Security-support Analysis Data (GFSAD)⁵⁷, Annual Global Land Cover (AGLC)⁵⁸ and Global Land Cover Estimation (GLanCE)⁵⁹, with 1,691, 1,296, and 2,387 samples, respectively. These samples were randomly distributed across southwestern China. In the second part, we conducted a stratified random sampling with the strata defined by the proportion of the cropland consistency map. We generated a total of 10,491 sample points within the maximum cropland extent to increase the density of samples. Eventually, 15,865 ground-truth units were composited and distributed across southwestern China (within and outside the potential cropland extent) (Table 1). They then underwent cross-validation using Google Very High Resolution (VHR) images from around 2020, reviewed by two trained senior specialists individually, to ensure that the reference set is stable and representative.

Table 1 Composition of sample used for accuracy rectification in Southwestern China.

Full size table

Accuracy metrics

The study implemented the metrics in different scales to demonstrate the accuracy of each provincial unit and the whole of southwestern China. Accuracy assessment of different cropland maps in Southwestern China is a problem of precision assessment in a binary classification scenario. Five metrics commonly used to evaluate the performance of machine learning as well as binary remote sensing classification are used here⁶⁰: user accuracy (, known as precision), producer accuracy, (known as recall or sensitivity), F1-score, overall accuracy (${OA}$), and Matthew’s correlation coefficient (${MCC}$). The MCC encompasses true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) and is generally regarded as a balanced indicator. It remains applicable even when there is a significant disparity in the sample sizes of the two categories. A high score is only produced when predictions achieve satisfactory outcomes across all four categories of the confusion matrix (TP, TN, FN, and FP). According to the study of Chicco and Jurman⁶¹, the MCC produces a more reliable statistical rate, which makes a high score, especially in the binary classifications and their confusion matrices. We also adjusted the accuracy using the methods proposed by Olofsson, et al.⁶⁰.

The formulas are shown below:

$${Overall\; Accuracy}\,({OA})=\frac{{TP}+{TN}}{{TP}+{FP}+{TN}+{FN}}$$

(1)

$${Producers\; Accuracy}\,({PA})=\frac{{TP}}{{TP}+{FN}}$$

(2)

$${User\; Accuracy}\,({UA})=\frac{{TP}}{{TP}+{FP}}$$

(3)

$${F1}_{{Score}}=\frac{2\times \left({PA}\times {UA}\right)}{\left({PA}+{UA}\right)}$$

(4)

$${MCC}=\frac{{TP}\times {TN}-{FP}\times {FN}}{\sqrt{({TP}+{FP})({TP}+{FN})({TN}+{FP})({TN}+{FN})}}$$

(5)

Where the TP (True Positive) and TN (True Negative) stand for cropland/ non-cropland samples that were correctively mapped; while the FP (False Positive) and FN (False Negative) stand for cropland/ non-cropland samples being incorrectly mapped to the other category, respectively.

Generation of refined cropland map

We harmonized the ten existing cropland maps and generated a refined cropland map through the self-adjusted threshold method (Fig. 4). Specifically, we generated a vote map, where each pixel indicates the frequency with which it is labeled cropland among ten different cropland maps (Fig. 5). Exception for pixels not identified as cropland by any datasets (grey areas on the voting maps), the frequency values have a minimum of 1 and a maximum of 10. We ranked the frequencies from smallest to largest, used a total of 10 numbers from 1 to 10 as thresholds, and then extracted the range of cropland under the corresponding thresholds. For example, a threshold of 1 indicates that only one of the ten datasets is considered to be cropland, in which case the generated cropland map is noted as ${{Map}}_{{threshold\_}01}$. Similarly, a threshold of 10 indicates that all ten datasets are cropland, in this case, the generated cropland map is noted as ${{Map}}_{{threshold\_}10}$. By extracting different thresholds, we generated ten cropland maps (i.e., ${{Map}}_{{threshold\_}01},{{Map}}_{{threshold\_}02},\ldots ,{{Map}}_{{threshold\_}10}$) that synthesized the consistency of the ten datasets. Furthermore, the cropland area, overall accuracy, and F1 scores of the individual cropland maps derived from refined maps with different thresholds were calculated, and histograms were drawn.

According to the principle that the F1 score is the highest and bias between the mapped area with statistics from TNLS being the lowest, the optimal cropland map generated by the corresponding frequency thresholds (${{Map}}_{{threshold\_refined}}$) is determined to be the refined cropland map. The metric performance is shown in Fig. 6.

Cropland area comparison at multiple scales

Area-based comparison

In this study, the cropland area derived from each of the ten cropland datasets was calculated at three administrative levels: the whole of southwestern China, provincial, and district scale, respectively. The administrative boundary file comes from the Resource and Environment Science and Data Canter, Chinese Academy of Sciences (https://www.resdc.cn/). Then, they were compared with the census data from the Third National Land Survey (TNLS) (https://gtdc.mnr.gov.cn). It was worth noting that due to the special administrative organization setup of Chongqing as a municipality directly under the central government (equivalent to a provincial administrative unit), the district-level administrative entities under its jurisdiction (e.g., Wanzhou District of Chongqing) are treated as prefectural-level municipalities to be compared with prefectural-level municipalities in other provinces. Two metrics, including R² and Root Mean Square Error (RMSE), were characterized to measure the correlation between the mapped area and statistics, and the formulas are listed below.

$${R}^{2}=1-\frac{\mathop{\sum }\limits_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}}{\mathop{\sum }\limits_{i=1}^{n}{\left({y}_{i}-\bar{y}\right)}^{2}}$$

(6)

$${RMSE}=\sqrt{\frac{\mathop{\sum }\limits_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}}{n}}$$

(7)

where the ${y}_{i}$ and $\bar{y}$ represented the TNLS area and the mean value of TNLS, the ${x}_{i}$ indicated the mapped area, and the $n$ showed the number of administrative units, respectively.

Spatial extent comparison

The spatial distribution of cropland in southwestern China is heterogeneous, which means that consistency in the overall area does not necessarily lead to spatial consistency. To compare the performance of different datasets in spatial details, the study conducted a pixel-by-pixel comparison and generated a vote map for the ten cropland products. The frequency of pixels labeled cropland was performed and demonstrated as a number of votes from 1 to 10 from bad to good as to consistency. Pixels that were not labeled as cropland by any of the datasets were not considered and labeled as zero.

Furthermore, eight sites distributed in southwestern China (the zoomed-in view are indicated in Fig. 7) were selected in this study to demonstrate the spatial details of the cropland maps. These eight sites covered plains, hills, mountains, river valleys, and other typical landscapes in southwestern China to demonstrate that the refined map better depicts the cropland extent.

Data Records

The refined cropland map generated based on the methods of this article is named Cropland_Syn, depending on the optimal threshold presented in Table 2. Specifically, the threshold_05 has the highest F1 Score and is more consistent with statistical data but has a slightly lower OA than the threshold_06, which has a higher OA. The Cropland_Syn based on the threshold_05 significantly improved the area and extent of cropland mapping.

Table 2 Performance of refined map by thresholds.

Full size table

The vote map of inconsistency and refined cropland map in GeoTIFF format with Albers conic equal area projected coordinate system at 30-m resolution and their attached pyramid file in.ovr format are available from the figshare repository⁶². All the raster data can be loaded and edited both in script tools (such as rasterio, gdal, cartopy, etc.) and software supporting.tif format files, such as ESRI ArcGIS (https://www.esri.com/) and QGIS (https://qgis.org/).

The cropland/non-cropland samples of southwestern China in ESRI shapefile format were also shared in the repository. There are five fields in the attribute table; in addition to the latitude and longitude coordinates, the land field has values of 0 and 10, representing non-cropland and cropland samples, respectively. The Source field provides the source of the sample points. The Albers conic equal area projected coordinate system file applied to southwestern China ends with “.prj” format is also uploaded to the repository, which is available for data users in ESRI ArcGIS to reuse without self-definition.

Technical Validation

Two methods were used to validate of the resultant maps, including the sample-aided accuracy assessment and cross-comparison with the existing ten cropland maps.

Accuracy assessments of cropland maps

To quantitatively characterize the accuracies of the ten existing cropland maps and the refined map at multi-administrative scales, we used the previously constructed sample dataset to validate them in Southwestern China and at different provincial administrative scales.

The results showed that the 30-m refined cropland map (in which the threshold_05 was renamed as Cropland_Syn in Data Records) based on the thresholds of the vote map ranked among the highest in accuracy (generally with an overall accuracy higher than 0.80) at both the provincial administrative scales and the southwestern China regarding overall accuracy, F1 score, and MCC values (Fig. 8 and Table S2). This is also supported by the sample-based error distribution in Fig. 9. It outperformed most of the ten existing products (especially all five datasets with the same 30-m spatial resolution, where the accuracy of these datasets is even lower than average). It was only surpassed by the WorldCover, which has a higher spatial resolution of 10 m.

Comparisons with existing cropland maps and land survey data

Figure 2 shows the spatial distribution of 12 cropland maps in southwestern China. To avoid the difference in visual effects caused by spatial resolution, the nearest-neighbor method was used to resample cropland maps with spatial resolution of 10- or 1 m to 30 m. Overall, the refined cropland map based on threshold_05 improved both the overestimation in cropland area for the existing 30-m data (mainly due to misclassification errors) and the underestimation of cropland in the 10-m and 1-m data (mainly due to omission errors). Specifically, this is evidenced by the spatial extent comparisons performed on the eight sites illustrated in Fig. 7. For the plain areas (c, i in Fig. 7), the optimized cropland maps provided cropland mapping that was more consistent with the actual distribution. In hilly areas (b,f,h in Fig. 7), the misclassification of the other 30-m cropland maps can be substantially improved.

The mapped area of each cropland map was further compared with the statistics at different administrative levels from the TNLS, which has been recognized as the most precise source of land area data in China⁶³. In general, cropland maps with higher spatial resolution (1-m, 10-m) tended to underestimate the cropland area in Chinese regions compared to the area published by TNLS (red dashed line in Fig. 10). In contrast, the 30-m cropland maps tended to overestimated the area of croplands. There are exceptions, however, with the 10-m CRLC and the 30-m GLAD showing the opposite trends compared to other cropland maps with the same spatial resolution, respectively. The refined maps for thresholds 5 and 6 showed a much smaller gap from the red line in the figure, which is significantly superior to the other nine pre-existing cropland maps except for World Cover, and the differences from World Cover were extremely small. In addition, the area of the 12 cropland maps was compared at the prefecture level (Fig. 11) and county level (Fig. 12), respectively. The scattered points of the refined cropland maps were more centrally distributed along the 1:1 line than the overestimation and underestimation of the existing maps. This suggested that the refined map reduces both the omission error of high spatial resolution maps and the commission error of low spatial resolution maps, approaching the accuracy of the NLSD regarding cropland area.

Uncertainty analysis

Several uncertainties may exist in the dataset. First, for the generation of vote maps, pixels labeled as non-cropland with pixel values equal to zero may have been caused by omission errors in the corresponding cropland product. For example, cropland maps refined using the thresholding method may lose a partition of accuracy due to the omission errors in some of the mountainous areas in Fig. 7(j). It follows that there may be some pixels that are truly cropland not identified as cropland by any of the datasets, further contributing to the omission error in the refined cropland map. In future work, adding training samples in areas of high inconsistency according to the vote map to improve the accuracy in regions with cropland mapping difficulties (i.e., hilly and mountainous regions in Fig. 5) is a possible solution⁶⁴.

Second, the study performed resampling to harmonize the spatial resolution of the ten available cropland data products, which has to some extent weakened the ability of the 1-m or 10-m spatial resolution datasets to depict cropland details (See in data pre-processing). Although cropland mapping is inherently difficult in an area with such diverse land cover types and fragmented parcels, the refined map produced in this study still offered a more significant improvement in accuracy and for cropland area (Fig. 7 and Fig. 8). This study also demonstrates the feasibility of generating optimized new data from existing data in an era when open-sourced cropland maps are becoming increasingly abundant^65,66,67. For further improvements in data fusion, it is possible to generate higher-precision cropland maps by optimizing and integrating datasets based on geographic subdivisions and data-driven alogrithms^68,69.

Third, the study has analysed ten recently released cropland maps since 2021. However, only the year 2020 was considered for the comparison. Most of these maps (including World Cover, ESRI Land Cover, Dynamic Word, CLCD, GLC_FCS 30, CACD, and GLAD) can reflect long-term cropland dynamics²³. Therefore, the following work could be focused on optimizing time-series cropland maps with a time-series validation sample set to meet the demands of dynamic cropland monitoring better.

Usage Notes

Cropland maps play an indispensable role in guiding agricultural land management. However, in an era of continuous enrichment of open-source data, inconsistencies across datasets hinder our understanding of agricultural land systems’ processes, patterns, and responses to anthropogenic disturbances. Compared to the existing ten cropland datasets, our data-driven refined map (CroplandSyn) shows higher consistency with official land survey data and greater accuracy. Additionally, the shared sample set of this work facilitates quality assessment and continuous refinement of cropland maps in Southwest China.

Code availability

JavaScript codes of the Earth Engine repository used to generate the cropland maps and metrics for accuracy evaluation are shared and available from the figshare repository. Python codes for raster pyramid building are also provided.

The software and modules used in this study include Origin 2023b, ArcGIS Pro 3.1, Python 3.7, gma 1.1.5, and ArcPy 3.1.

The very high spatial resolution (VHR) Google Earth images are accessible by the ArcGIS Web Map Tile Service (WTMS) service.

References

Foley, J. A. et al. Solutions for a cultivated planet. Nature 478, 337–342 (2011).
Article CAS PubMed ADS Google Scholar
Foley, J. A. et al. Global Consequences of Land Use. Science 309, 570–574 (2005).
Article CAS PubMed ADS Google Scholar
Zabel, F. et al. Global impacts of future cropland expansion and intensification on agricultural markets and biodiversity. Nat. Commun. 10, 2844 (2019).
Article PubMed PubMed Central ADS Google Scholar
Meng, Z. et al. Post-2020 biodiversity framework challenged by cropland expansion in protected areas. Nat. Sustainability, 1-11 (2023).
Ge, Q., Dai, J., He, F., Pan, Y. & Wang, M. Land use changes and their relations with carbon cycles over the past 300 a in China. Science in China Series D: Earth Sciences 51, 871–884 (2008).
Article ADS Google Scholar
Liang, X. et al. Exploring cultivated land evolution in mountainous areas of Southwest China, an empirical study of developments since the 1980s. Land Degrad. Dev. 32, 546–558 (2021).
Article Google Scholar
Bell, S. M. et al. Quantifying the recarbonization of post-agricultural landscapes. Nat. Commun. 14, 2139 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
Clark, H. & Wu, H. in Furthering the Work of the United Nations (United Nations, 2016).
Xu, Z. et al. Assessing progress towards sustainable development over space and time. Nature 577, 74–78 (2020).
Article CAS PubMed ADS Google Scholar
Pradhan, P. A threefold approach to rescue the 2030 Agenda from failing. Natl. Sci. Rev. 10, nwad015 (2023).
Article PubMed PubMed Central Google Scholar
Fritz, S. et al. Mapping global cropland and field size. Global Change Biol. 21, 1980–1992 (2015).
Article ADS Google Scholar
Wu, B. et al. Challenges and opportunities in remote sensing-based crop monitoring: a review. Natl. Sci. Rev. 10, nwac290 (2023).
Article PubMed Google Scholar
Hansen, M. C. & Loveland, T. R. A review of large area monitoring of land cover change using Landsat data. Remote Sens. Environ. 122, 66–74 (2012).
Article ADS Google Scholar
Grekousis, G., Mountrakis, G. & Kavouras, M. An overview of 21 global and 43 regional land-cover mapping products. Int. J. Remote Sens. 36, 5309–5335 (2015).
Article Google Scholar
Naboureh, A., Bian, J., Lei, G. & Li, A. A review of land use/land cover change mapping in the China-Central Asia-West Asia economic corridor countries. Big Earth Data 0, 1–21 (2020).
Google Scholar
Nabuurs, G.-J. et al. Glasgow forest declaration needs new modes of data ownership. Nat. Clim. Change 12, 415–417 (2022).
Article ADS Google Scholar
Wulder, M. A. et al. Current status of Landsat program, science, and applications. Remote Sens. Environ. 225, 127–147 (2019).
Article ADS Google Scholar
Zhang, D. et al. A generalized approach based on convolutional neural networks for large area cropland mapping at very high resolution. Remote Sens. Environ. 247, 111912 (2020).
Article Google Scholar
Drusch, M. et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 120, 25–36 (2012).
Article ADS Google Scholar
Roy, D. P., Huang, H., Houborg, R. & Martins, V. S. A global analysis of the temporal availability of PlanetScope high spatial resolution multi-spectral imagery. Remote Sens. Environ. 264, 112586 (2021).
Article Google Scholar
Dong, J. et al. State of the art and perspective of agricultural land use remote sensing information extraction. Journal of Geo-Information Science 22, 772–783 (2020).
CAS Google Scholar
Yuan, Q. et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 241, 111716 (2020).
Article Google Scholar
Song, X.-P. The future of global land change monitoring. Int. J. Digital Earth 16, 2279–2300 (2023).
Article ADS Google Scholar
Wang, Y. et al. A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS J. Photogramm. Remote Sens. 206, 311–334 (2023).
Article ADS Google Scholar
Chen, J., Liu, Y., Liu, R. & Wei, X. Estimation of High-Resolution Fractional Tree Cover Using Landsat Time-Series Observations. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2023).
Google Scholar
Zhou, Y. et al. Are There Sufficient Landsat Observations for Retrospective and Continuous Monitoring of Land Cover Changes in China? Remote Sens. 11, 1808 (2019).
Article ADS Google Scholar
Lei, G. et al. Land Cover Mapping in Southwestern China Using the HC-MMK Approach. Remote Sens. 8, 305 (2016).
Article ADS Google Scholar
Tubiello, F. N. et al. Measuring the world’s cropland area. Nat. Food 4, 30–32 (2023).
Article PubMed Google Scholar
Di, Y. et al. Mapping croplands in the granary of the Tibetan Plateau using all available Landsat imagery, a phenology-based approach, and Google Earth Engine. Remote Sens. 13, 2289 (2021).
Article ADS Google Scholar
Venter, Z. S., Barton, D. N., Chakraborty, T., Simensen, T. & Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 14 (2022).
Wickham, J., Stehman, S. V., Sorenson, D. G., Gass, L. & Dewitz, J. A. Thematic accuracy assessment of the NLCD 2016 land cover for the conterminous United States. Remote Sens. Environ. 257, 112357 (2021).
Article Google Scholar
Wang, Z. & Mountrakis, G. Accuracy Assessment of Eleven Medium Resolution Global and Regional Land Cover Land Use Products: A Case Study over the Conterminous United States. Remote Sens. 15, 3186 (2023).
Article ADS Google Scholar
Yang, Y., Xiao, P., Feng, X. & Li, H. Accuracy assessment of seven global land cover datasets over China. ISPRS J. Photogramm. Remote Sens. 125, 156–173 (2017).
Article ADS Google Scholar
Yang, Z. et al. Accuracy Assessment and Inter-Comparison of Eight Medium Resolution Forest Products on the Loess Plateau, China. ISPRS Int. J. Geo-Inf. 6, 152 (2017).
Article Google Scholar
Hou, M. et al. The urgent need to develop a new grassland map in China: based on the consistency and accuracy of ten land cover products. Sci. China Life Sci. 66, 385–405 (2023).
Article PubMed Google Scholar
Wei, Y., Lu, M., Wu, W. & Ru, Y. Multiple factors influence the consistency of cropland datasets in Africa. Int. J. Appl. Earth Obs. Geoinf. 89, 102087 (2020).
Google Scholar
Nabil, M. et al. Assessing factors impacting the spatial discrepancy of remote sensing based cropland products: A case study in Africa. Int. J. Appl. Earth Obs. Geoinf. 85, 102010 (2020).
Google Scholar
Gao, Y. et al. Consistency analysis and accuracy assessment of three global 30-m land-cover products over the European Union using the LUCAS datase. Remote Sens. 12, 3479 (2020).
Article ADS Google Scholar
Liu, L. et al. Finer-Resolution Mapping of Global Land Cover: Recent Developments, Consistency Analysis, and Prospects. J. Remote Sens. 2021 (2021).
Cui, Y. et al. Decoding the inconsistency of six cropland maps in China. The Crop Journal 12, 281–294 (2024).
Article Google Scholar
Zhang, C., Dong, J. & Ge, Q. Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis. Comput. Electron. Agric. 197, 106946 (2022).
Article Google Scholar
Dong, J. et al. Opportunities and challenges in monitoring cultivated land red line in big data era. Bulletin of Chinese Academy of Sciences 38, 1781–1792 (2023).
Google Scholar
Qin, X. et al. Identification of Parcel-Scale Crop Types in Southwestern Mountainous Area based on Time Series Remote Sensing Images. Journal of Geo-Information Science 25, 654–668 (2023).
Google Scholar
Li, A. et al. The driving factors and buffering mechanism regulating cropland soil acidification across the Sichuan Basin of China. Catena 220 (2023).
Li, Z. et al. SinoLC-1: the first 1-meter resolution national-scale land-cover map of China created with the deep learning framework and open-access data. Earth Syst. Sci. Data Discuss., 1-38 (2023).
Zanaga, D. et al. ESA WorldCover 10 m 2020 v100, Zenodo, https://doi.org/10.5281/zenodo.5571936 (2021).
Karra, K. et al. Global land use/land cover with Sentinel 2 and deep learning. in 2021 IEEE international geoscience and remote sensing symposium IGARSS. 4704-4707 (IEEE).
Brown, C. F. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 9, 251 (2022).
Article PubMed Central Google Scholar
Liu, Y., Zhong, Y., Ma, A., Zhao, J. & Zhang, L. Cross-resolution national-scale land-cover mapping based on noisy label learning: A case study of China. Int. J. Appl. Earth Obs. Geoinf. 118, 103265 (2023).
Google Scholar
Chen, J. et al. Global land cover mapping at 30m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 103, 7–27 (2015).
Article ADS Google Scholar
Yang, J. & Huang, X. The 30m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 13, 3907–3925 (2021).
Article ADS Google Scholar
Zhang, X. et al. GLC_FCS30: global land-cover product with fine classification system at 30m using time-series Landsat imagery. Earth Syst. Sci. Data 13, 2753–2776 (2021).
Article ADS Google Scholar
Hansen, M. C. et al. Global land use extent and dispersion within natural land cover using Landsat data. Environ. Res. Lett. 17, 034050 (2022).
Article ADS Google Scholar
Tu, Y. et al. A 30 m annual cropland dataset of China from 1986 to 2021. Earth Syst. Sci. Data Discuss., 1-34 (2023).
Potapov, P. et al. Global maps of cropland extent and change show accelerated cropland expansion in the twenty-first century. Nat. Food 3, 19–28 (2022).
Article PubMed Google Scholar
Stehman, S. V. & Foody, G. M. Key issues in rigorous accuracy assessment of land cover products. Remote Sens. Environ. 231, 111199 (2019).
Article Google Scholar
Teluguntla, P. et al. A 30-m Landsat-derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine cloud computing platform. ISPRS J. Photogramm. Remote Sens. 144, 325–340 (2018).
Article ADS Google Scholar
Xu, X., Li, B., Liu, X., Li, X. & Shi, Q. Mapping annual global land cover changes at a 30 m resolution from 2000 to 2015. J. Remote Sens. 25, 1896–1916 (2021).
Google Scholar
Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10, 879 (2023).
Article PubMed PubMed Central Google Scholar
Olofsson, P. et al. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 148, 42–57 (2014).
Article ADS Google Scholar
Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020).
Article PubMed PubMed Central Google Scholar
Cui, Y. et al. Validation and refinement of cropland maps in southwestern China. figshare https://doi.org/10.6084/m9.figshare.25969603.v1 (2024).
Chen, X. et al. Toward sustainable land use in China: A perspective on China’s national land surveys. Land Use Policy 123, 106428 (2022).
Article Google Scholar
You, N. et al. The 10-m crop type maps in Northeast China during 2017–2019. Sci. Data 8, 41 (2021).
Article PubMed PubMed Central Google Scholar
Lei, G., Li, A., Bian, J. & Zhang, Z. The roles of criteria, data and classification methods in designing land cover classification systems: evidence from existing land cover data sets. Int. J. Remote Sens. 41, 5062–5082 (2020).
Article Google Scholar
Chaves, M. E. D., Picoli, M. C. A. & Sanches, D. Recent Applications of Landsat 8/OLI and Sentinel-2/MSI for Land Use and Land Cover Mapping: A Systematic Review. Remote Sens. 12, 3062 (2020).
Article ADS Google Scholar
Zhang, C., Dong, J. & Ge, Q. Mapping 20 years of irrigated croplands in China using MODIS and statistics and existing irrigation products. Sci. Data 9, 407 (2022).
Article PubMed PubMed Central Google Scholar
Nabil, M., Zhang, M., Wu, B., Bofana, J. & Elnashar, A. Constructing a 30m African cropland layer for 2016 by integrating multiple remote sensing, crowdsourced, and auxiliary datasets. Big Earth Data 6, 54–76 (2022).
Article Google Scholar
Lin, L. et al. Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm. Sci. Data 9, 63 (2022).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Youth Interdisciplinary Team Project of the Chinese Academy of Sciences (JCTD-2021–04), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA28060100), the Informatization Plan of the Chinese Academy of Sciences (Grant No. CAS-WX2021PY-0109), and the National Natural Science Foundation of China (Grant No. 72221002, 42271375). The creation and curation of dataset relied upon different open-sourced land use/ land cover datasets and sample sets. We would like to express our gratitude to the producers of the datasets used in this study for their valuable contributions and to those who assisted us during the fieldwork.

Author information

Authors and Affiliations

Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China
Yifeng Cui, Jinwei Dong, Yuanyuan Di & Ronggao Liu
University of Chinese Academy of Sciences, Beijing, 100049, China
Yifeng Cui & Jinwei Dong
Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Yifeng Cui & Na Chen
Department of Civil and Environmental Engineering, National University of Singapore, Singapore, 117576, Singapore
Chao Zhang
College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
Jilin Yang
Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing, 100871, China
Peng Guo & Mengxi Chen
Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, 315100, China
Yuanyuan Di
College of Resources, Sichuan Agricultural University, Chengdu, 611130, China
Aiwen Li

Authors

Yifeng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jinwei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jilin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Na Chen
View author publications
You can also search for this author in PubMed Google Scholar
Peng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Di
View author publications
You can also search for this author in PubMed Google Scholar
Mengxi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Aiwen Li
View author publications
You can also search for this author in PubMed Google Scholar
Ronggao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yifeng Cui: Conceptualization, methodology, software, programming, validation, visualization, writing – original draft. Jinwei Dong: Conceptualization, methodology, writing – review & editing, supervision, funding acquisition. Chao Zhang: Review & editing, supervision, validation. Jilin Yang: Review & editing, supervision. Na Chen: Review & editing. Peng Guo: Methodology. Yuanyuan Di: Validation, review & editing. Mengxi Chen: review & editing. Aiwen Li: Validation. Ronggao Liu: Conceptualization, supervision, funding acquisition.

Corresponding authors

Correspondence to Jinwei Dong or Ronggao Liu.

Ethics declarations

Competing interests

Following the new policies of Springer Nature, we acknowledge that Professor Jinwei Dong is an Editorial Board Member of Scientific Data.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cui, Y., Dong, J., Zhang, C. et al. Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets. Sci Data 11, 671 (2024). https://doi.org/10.1038/s41597-024-03508-5

Download citation

Received: 13 September 2023
Accepted: 12 June 2024
Published: 22 June 2024
DOI: https://doi.org/10.1038/s41597-024-03508-5
Springer Nature Limited

Associated content

Remote sensing data for changes in land use

Collection 06 July 2023

Validation and refinement of cropland map in southwestern China by harnessing ten contemporary datasets

Abstract

Similar content being viewed by others

Using a global reference sample set and a cropland map for area estimation in China

A comparative analysis of five global cropland datasets in China

A crop type dataset for consistent land cover classification in Central Asia

Explore related subjects

Background & Summary

Methods

Study area

Cropland maps

Ten contemporary land cover/use maps for croplands

Sino-LC1

ESA WorldCover

ESRI Land Cover

Dynamic World

CRLC

GlobeLand 30

CLCD

GLC_FCS30

GLAD

CACD

Generation of binary cropland maps

Accuracy assessments based on independent sample set

Generation of ground-truth sample independent from map producers

Accuracy metrics

Generation of refined cropland map

Cropland area comparison at multiple scales

Area-based comparison

Spatial extent comparison

Data Records

Technical Validation

Accuracy assessments of cropland maps

Comparisons with existing cropland maps and land survey data

Uncertainty analysis

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Materials

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation