H.138 Dataset¶
Description¶
Target Soil Properties: SOC, pH, Clay
Groups of Features: MIR
Sample size: 138
Number of Features: 2,489
Coordinates: With coordinates (EPSG: 32649)
Location: Hubei, China
Sampling Design: Two sampling designs: (1) adapted latin hypercube sampling taking into account legacy samples, correlation and accessibility and (2) uncertainty guided sampling based on uncertainty predictions from a random forest model (Stumpf et al. 2017)
Study Area Size: 420 ha
Geological Setting: Sedimentary rocks, mainly dolomite with silt and limestone formed in the middle and lower Jurassic
Previous Data Publication: Full dataset published in Wadoux et al. (2024)
- Contact Information:
Alexandre M.J.-C- Wadoux (Alexandre.Wadoux@inrae.fr), French National Institute for Agriculture, Food, and Environment (INRAE)
License: CC BY-SA 4.0
Publication/Modification Date (d/m/y): 28.02.25, version 1.0
- Changelog:
Version 1.0 (28.02.25): Initial release
Details¶
Dataset¶
The dataset contains the following target soil properties and features:
Target Soil Properties:¶
- SOC - Soil Organic Carbon
Code:
SOC_targetUnit: %
Protocol: Determined by the difference of total carbon and inorganic carbon, where total carbon was obtained through elemental analysis by measuring the CO₂ release during dry combustion (DIN ISO 10694) without acid pretreatment and inorganic carbon as 0.12 x the calcium carbonate content, determined by the gas-volumetric Scheibler Method (ISO 10693)
Sampling Date: June 2013, May, 2014 and November 2014
Sampling Depth: 0 - 20 cm
- pH
Code:
pH_targetUnit: Unitless
Protocol: Measured in water suspension with a glass electrode with unspecified liquid:soil ratio
Sampling Date: June 2013, May, 2014 and November 2014
Sampling Depth: 0 - 20 cm
- Clay
Code:
Clay_targetUnit: %
Protocol: Measured through fractioning the soil into the sand fractions by sieving, and the silt and clay fractions by x-ray sedimentation
Sampling Date: June 2013, May, 2014 and November 2014
Sampling Depth: 0 - 20 cm
Groups of Features:¶
- MIR – Mid Infrared Spectroscopy
Number of Features: 2,489
Code(s):
wn_5397.9,wn_5396,wn_5394…wn_599.8Unit: % (Reflectance)
Sensing: VERTEX 70v FT-IR Spectrometer (Bruker Optik, Ettlingen, Germany), on dried and sieved samples (<2 mm) in the laboratory, spectral range was 7,500 - 370 cm^-1 at 0.4 cm^-1 intervals
Processing: Discarding irrelevant spectral data of the spectrum (7,500 - 5,397.9 cm^-1) and noisy edges of the spectrum (599.8 - 370 cm^-1)
Sampling Date: June 2013, May, 2014 and November 2014
- Spectral Information (After Data Processing):
Data Representation: Wavenumber (in cm^-1)
Spectral Resolution: ~2 cm^-1
Spectral Range: 5,397.9 – 599.8 cm^-1
Examples¶
from LimeSoDa import load_dataset, split_dataset
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np
# Load and explore the dataset
data = load_dataset("H.138")
dataset = data["Dataset"]
folds = data["Folds"]
coords = data["Coordinates"]
# Split into train/test using fold 1
X_train, X_test, y_train, y_test = split_dataset(
data=data,
fold=1,
targets=["pH_target", "SOC_target", "Clay_target"]
)
# Fit model and get predictions
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Calculate performance metrics
r2 = r2_score(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"R-squared: {r2:.7f}")
print(f"RMSE: {rmse:.7f}")
References¶
Wadoux, A. M. J.-C., Stumpf, F., & Scholten, T.. (2024). A catchment-scale dataset of soil properties and their mid-infrared spectra. Zenodo repository. https://doi.org/10.5281/zenodo.14557348
Stumpf, F., Schmidt, K., Goebes, P., Behrens, T., Schönbrodt-Stitt, S., Wadoux, A., Xiang, W. & Scholten, T. (2017). Uncertainty-guided sampling to improve digital soil maps. Catena, 153, 30-38.