RP.62 Dataset¶
Description¶
Target Soil Properties: SOC, pH, Clay
Groups of Features: ERa, Gamma, NIR, pH-ISE, VI
Sample size: 62
Number of Features: 1,410
Coordinates: Without coordinates because of privacy concerns
Location: Rhineland-Palatinate, Germany
Sampling Design: Regular grid sampling
Study Area Size: 3.3 ha
Geological Setting: Pleistocene periglacial slope deposits consisting of Weichselian loess with variable amounts of weathered Devonian sand-, silt-, and claystones and scattered Tertiary basalt bombs
Previous Data Publication: None
- Contact Information:
Stefan Paetzold (s.paetzold@uni-bonn.de), University of Bonn
Hamed Tavakoli (HTavakoli@atb-potsdam.de), Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB)
License: CC BY-SA 4.0
Publication/Modification Date (d/m/y): 28.02.25, version 1.0
- Changelog:
Version 1.0 (28.02.25): Initial release
Details¶
Dataset¶
The dataset contains the following target soil properties and features:
Target Soil Properties:¶
- SOC - Soil Organic Carbon
Code:
SOC_targetUnit: %
Protocol: Measured through titration after oxidization of the organic carbon (Walkley & Black 1934)
Sampling Date: October 2017
Sampling Depth: 0 - 30 cm
- pH
Code:
pH_targetUnit: Unitless
Protocol: Measured in a water suspension with a glass electrode with a 5:1 liquid:soil volumetric ratio (DIN ISO 10390)
Sampling Date: May 2020
Sampling Depth: 0 - 30 cm
- Clay
Code:
Clay_targetUnit: %
Protocol: Sieve-Pipette method, measured through fractioning the soil into the sand fractions by sieving, and the silt and clay fractions by sedimentation in water, German adaptation (DIN ISO 11277)
Sampling Date: October 2017
Sampling Depth: 0 - 30 cm
Groups of Features:¶
- ERa – Apparent Electrical Resistivity
Number of Features: 1
Code(s):
ERaUnit: Ω m
Sensing: Array of multiple rolling electrodes (Geophilus company, Caputh, Germany) on RapidMapper platform with exploration depth of 0 - 50 cm, in-situ
Processing: Ordinary Kriging to align sensing- with soil sampling locations
Sampling Date: September 2020
- Gamma
Number of Features: 5
Code(s):
G_Total_Counts,G_K,G_U,G_Th,G_CsUnit: Unitless
Sensing: Passive gamma sensor (MS-2000-CsI-MTS, Medusa Radiometrics BV, Groningen, Netherlands) on RapidMapper platform, in-situ
Processing: Ordinary Kriging to align sensing- with soil sampling locations
Sampling Date: September 2020
- NIR – Near Infrared Spectroscopy
Number of Features: 1,401
Code(s):
wl_1000,wl_1001,wl_1002…wl_2400Unit: % (Reflectance)
Sensing: NIR spectrometer (C11118GA, Hamamatsu Photonics K.K., Shizuoka Prefecture, Japan) on RapidMapper platform, in-situ, spectral range was 900 - 2550 nm at 15 nm intervals
Processing: Kriging to align sensing- with soil sampling locations, discarding noisy edges of the spectrum (900 - 1,000 nm & 2,400 - 2,550 nm), resampling to 1 nm intervals
Sampling Date: September 2020
- Spectral Information (After Data Processing):
Data Representation: Wavelength (in nm)
Spectral Resolution: 1 nm
Spectral Range: 1,000 - 2,400 nm
- pH-ISE – Ion Selective Electrodes for pH Determination
Number of Features: 1
Code(s):
pH-ISEUnit: Unitless
Sensing: Soil pH Manager (VERIS Technologies, Salinas, USA) on RapidMapper platform, in-situ
Processing: Ordinary Kriging to align sensing- with soil sampling locations
Sampling Date: September 2020
- VI - Vegetation Indices
Number of Features: 2
Code(s):
NDVI,GNDVIUnit: Unitless
Sensing: Sentinel-2 image during vegetative period (Level-2A) from “Copernicus Open Access Hub”
Processing: Calculating
NDVIas (B08 - B04) / (B08 + B04) andGNDVIas (B08 - B03) / (B08 + B03), extracting VI values from raster at soil sampling locationsSampling Date: March 2017
Examples¶
from LimeSoDa import load_dataset, split_dataset
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np
# Load and explore the dataset
data = load_dataset("RP.62")
dataset = data["Dataset"]
folds = data["Folds"]
coords = data["Coordinates"] # Will be NA for RP.62
# Split into train/test using fold 1
X_train, X_test, y_train, y_test = split_dataset(
data=data,
fold=1,
targets=["pH_target", "SOC_target", "Clay_target"]
)
# Fit model and get predictions
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Calculate performance metrics
r2 = r2_score(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"R-squared: {r2:.7f}")
print(f"RMSE: {rmse:.7f}")
References¶
Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.