Long-term series of daily snow depth dataset over the Northern Hemisphere based on machine learning (1980-2019)

Supported by the Strategic Priority Research Program of the Chinese Academy of Science (XDA19070100). Tao Che, the director of this program, who comes from Key Laboratory of Remote Sensing of Gansu Province, Northwest Institute of Eco-Environment and Resources, CAS. They used machine learning methods combined with multi-source gridded snow depth product data to derive a long-time series over the Northern Hemisphere. Firstly, the applicability of artificial neural network (ANN), support vector machine (SVM) and random forest (RF) method in snow depth fusion are compared. It is found that random forest method shows strong advantages in snow depth data fusion. Secondly, using the random forest method, combined with remote sensing snow depth products such as AMSR-E, AMSR-2, NHSD and GlobSnow and reanalysis data such as ERA-Interim and MERRA-2. These gridded snow depth products and environmental factor variables are used as the input independent variables of the model. In situ observations of China Meteorological Station (945), Russia Meteorological Station (620), Russian snow survey data (514), and global historical meteorological network (41261) are used as reference truth to train and verify the model. The daily gridded snow depth dataset of the snow hydrological year from 1980 to 2019 (September 1 of the previous year to May 31 of the current year) is prepared on the cloud platform provided by the CASEarth. Since the passive microwave brightness temperature data from 1980 to 1987 is the data of every other day, there will be a small number of missing trips in the data during this period. Using the ESM-SnowMIP and independent ground observation data for verification, the quality of the fusion data set has been improved. According to the comparison between the ground observation data and the snow depth products before fusion, the determination coefficient (R2) of the fusion data is increased from 0.23 (GlobSnow snow depth product) to 0.81, and the corresponding root mean square error (RMSE) and mean absolute error (MAE) are also reduced to 7.7 cm and 2.7 cm.

ASTER GDEM data in the Heihe River Basin (2009)

The data set includes ASTER GDEM data and its Mosaic. ASTER Global DEM (ASTER GDEM) is a Global digital elevation data product jointly released by NASA and Japan's ministry of economy, trade and industry (METI) on June 29, 2009. The DEM data is based on the observation results of NASA's new earth observation satellite TERRA.It is produced by the ASTER(Advanced Space borne Thermal Emission and Reflection Radio meter) sensor, which collects 1.3 million stereo image data, covering more than 99% of the earth's land surface.The data has a horizontal accuracy of 30 m (95% confidence) and an elevation accuracy of 7-14 m (95% confidence).This data is the third global elevation data, which is significantly higher than previous SRTM3 DEM and GTOPO30 data. We from NASA's web site (http://wist.echo.nasa.gov/api) to download the data of heihe river basin, and through the data center to distribute.The data distributed by the center completely retains the original appearance of the data without any modification to the data.If users need details about ASTER GDEM preparation process, please refer to the data documents of metadata connections, or visit http://www.ersdac.or.jp/GDEM/E/3.html or directly from https://lpdaac.usgs.gov/ reading and ASTER Global DEM related documents. ASTER GDEM is divided into several data blocks of 1×1 degree in distribution, and the distribution format is zip compression format. Each compressed file includes three files. The file naming format is as follows: ASTGTM_NxxEyyy_dem.tif ASTGTM_NxxEyyy_num.tif reademe.pdf Where xx is the starting latitude and yyy is the starting longitude._dem. Tif is the dem data file, _num. Tif is the data quality file, and reademe is the data description file. In order to facilitate users to use the data, on the basis of the fractional ASTER GDEM data, we splice fractional SRTM data to prepare the ASTER GDEM Mosaic map of the black river basin, which retains all the original features of ASTER GDEM without any resamulation. This data includes two files: heihe_aster_gdem_mosaic_dem.img Heihe_Aster_GDEM_Mosaic_num. Img The data is stored in the format of Erdas image, where the file _dem.img is the dem data file and the file _num. Img is the data quality file.

SRTM DEM dataset in China (2000)

The SRTM sensor has two bands, namely C-band and X-band. The SRTM we are using now comes from the C-band. The publicly released SRTM digital elevation products include DEM data at three different resolutions:     * SRTM1 covers only the continental United States, with a spatial resolution of 1s;     * SRTM3 data covers the world with a spatial resolution of 3s. This is the most widely used dataset. The elevation reference of SRTM3 is the geoid of EGM96 and the horizontal reference is WGS84. The nominal absolute elevation accuracy is ± 16m, and the absolute plane accuracy is ± 20m.     * SRTM30 data also covers the world, with a resolution of 30s. There are multiple versions of SRTM data. The early SRTM data was completed by NASA's "JPL" (Jet Propulsion Laboratory) ground data processing system (GDPS). The data is called SRTM3- 1. The National Geospatial Intelligence Agency has further processed the data, and the lack of data has been significantly improved. The data is called SRTM3-2. This dataset is mainly the fourth version of SRTM terrain data obtained by CIAT (International Center for Tropical Agriculture) using a new interpolation algorithm. This method better fills the SRTM 90 data hole. The interpolation algorithm comes from Reuter et al. (2007). The data of SRTM is organized as follows: every 5 latitude and longitude grids is divided into a file, which are divided into 24 rows (-60 to 60 degrees) and 72 columns (-180 to 180 degrees). The file naming rule is srtm_XX_YY.zip, where XX indicates the number of columns (01-72), and YY indicates the number of rows (01-24). The resolution of the data is 90 m. Data use: SRTM data uses a 16-bit value to represent the elevation value (-/ + / 32767 meters), the maximum positive elevation is 9000 meters, and the negative elevation (12,000 meters below sea level). -32767 standard for empty data.

The atmospheric forcing data in the Heihe River Basin (2000-2021)

Near surface atmospheric forcing data were produced by using Wether Research and Forecasting (WRF) model over the Heihe River Basin at hourly 0.05 * 0.05 DEG resolution, including the following variables: 2m temperature, surface pressure, water vapor mixing ratio, downward shortwave & upward longwave radiation, 10m wind field and the accumulated precipitation. The forcing data were validated by observational data collected by 15 daily Chinese Meteorological Bureau conventional automatic weather station (CMA), a few of Heihe River eco-hydrological process comprehensive remote sensing observation (WATER and HiWATER) site hourly observations were verified in different time scales, draws the following conclusion: 2m surface temperature, surface pressure and 2m relative humidity are more reliable, especially 2m surface temperature and surface pressure, the average errors are very small and the correlation coefficients are above 0.96; correlation between downward shortwave radiation and WATER site observation data is more than 0.9; The precipitation agreed well with observational data by being verified based on rain and snow precipitation two phases at yearly, monthly, daily time scales . the correlation coefficient between rainfall and the observation data at monthly and yearly time scales were up to 0.94 and 0.84; the correlation between snowfall and observation data at monthly scale reached 0.78, the spatial distribution of snowfall agreed well with the snow fractional coverage rate of MODIS remote sensing product. Verification of liquid and solid precipitation shows that WRF model can be used for downscaling analysis in complex and arid terrain of Heihe River Basin, and the simulated data can meet the requirements of watershed scale hydrological modeling and water resources balance. The data for 2000-2012 was provided in 2013. The data for 2013-2015 was updated in 2016. The data for 2016-2018 was updated in 2019. The data for 2019-2021 was updated in 2021.

Antarctic ice sheet mass balance data set (1985-2015)

The Antarctic ice sheet is one of the largest potential sources of global sea level rise. Accurately determining the mass budget of the ice sheet is the key to understand the dynamic changes of the Antarctic ice sheet. It is very important to understand the evolution process of the ice sheet and accurately predict the future global sea level rise. Based on the MEaSUREs Antarctic groundingline and the basin boundaries, we discretize the groundingline, combine the MEaSUREs and RAMP annual ice velocity data from 1985 to 2015 with the BedMachine ice thickness data, and vectorially calculate the ice discharge at each flux gate of the groundingline. We use the surface mass balance data of RACMO2.3p2 model to spatially calculate the surface mass balance of each basin, and combined it with the ice discharge results to obtain the Antarctic ice sheet mass balance data set (1985-2015). The data set includes the mass balance results of each basin of the Antarctic ice sheet in the year 1985, 2000 and 2015, and the annual ice velocity data, ice thickness and annual ice discharge corresponding to the location of each flux gate. The data set realizes the fine evaluation of ice flux at the groundingline, and reflect the changes and spatial distribution characteristics of the mass balance of each basin of the Antarctic ice sheet in recent 30 years. It provides basic data for the subsequent fine change evaluation and prediction of the mass balance of the Antarctic ice sheet and the exploration of the mechanism of ice sheet loss.