RP.62 Dataset ============= Description ----------- * Target Soil Properties: SOC, pH, Clay * Groups of Features: ERa, Gamma, NIR, pH-ISE, VI * Sample size: 62 * Number of Features: 1,410 * Coordinates: Without coordinates because of privacy concerns * Location: Rhineland-Palatinate, Germany * Sampling Design: Regular grid sampling * Study Area Size: 3.3 ha * Geological Setting: Pleistocene periglacial slope deposits consisting of Weichselian loess with variable amounts of weathered Devonian sand-, silt-, and claystones and scattered Tertiary basalt bombs * Previous Data Publication: None * Contact Information: * Stefan Paetzold (s.paetzold@uni-bonn.de), University of Bonn * Hamed Tavakoli (HTavakoli@atb-potsdam.de), Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB) * License: CC BY-SA 4.0 * Publication/Modification Date (d/m/y): 28.02.25, version 1.0 * Changelog: * Version 1.0 (28.02.25): Initial release Details ------- Dataset ^^^^^^^ The dataset contains the following target soil properties and features: Target Soil Properties: """""""""""""""""""""" SOC - Soil Organic Carbon * Code: ``SOC_target`` * Unit: % * Protocol: Measured through titration after oxidization of the organic carbon (Walkley & Black 1934) * Sampling Date: October 2017 * Sampling Depth: 0 - 30 cm pH * Code: ``pH_target`` * Unit: Unitless * Protocol: Measured in a water suspension with a glass electrode with a 5:1 liquid:soil volumetric ratio (DIN ISO 10390) * Sampling Date: May 2020 * Sampling Depth: 0 - 30 cm Clay * Code: ``Clay_target`` * Unit: % * Protocol: Sieve-Pipette method, measured through fractioning the soil into the sand fractions by sieving, and the silt and clay fractions by sedimentation in water, German adaptation (DIN ISO 11277) * Sampling Date: October 2017 * Sampling Depth: 0 - 30 cm Groups of Features: """"""""""""""""" ERa – Apparent Electrical Resistivity * Number of Features: 1 * Code(s): ``ERa`` * Unit: Ω m * Sensing: Array of multiple rolling electrodes (Geophilus company, Caputh, Germany) on RapidMapper platform with exploration depth of 0 - 50 cm, in-situ * Processing: Ordinary Kriging to align sensing- with soil sampling locations * Sampling Date: September 2020 Gamma * Number of Features: 5 * Code(s): ``G_Total_Counts``, ``G_K``, ``G_U``, ``G_Th``, ``G_Cs`` * Unit: Unitless * Sensing: Passive gamma sensor (MS-2000-CsI-MTS, Medusa Radiometrics BV, Groningen, Netherlands) on RapidMapper platform, in-situ * Processing: Ordinary Kriging to align sensing- with soil sampling locations * Sampling Date: September 2020 NIR – Near Infrared Spectroscopy * Number of Features: 1,401 * Code(s): ``wl_1000``, ``wl_1001``, ``wl_1002`` ... ``wl_2400`` * Unit: % (Reflectance) * Sensing: NIR spectrometer (C11118GA, Hamamatsu Photonics K.K., Shizuoka Prefecture, Japan) on RapidMapper platform, in-situ, spectral range was 900 - 2550 nm at 15 nm intervals * Processing: Kriging to align sensing- with soil sampling locations, discarding noisy edges of the spectrum (900 - 1,000 nm & 2,400 - 2,550 nm), resampling to 1 nm intervals * Sampling Date: September 2020 * Spectral Information (After Data Processing): * Data Representation: Wavelength (in nm) * Spectral Resolution: 1 nm * Spectral Range: 1,000 - 2,400 nm pH-ISE – Ion Selective Electrodes for pH Determination * Number of Features: 1 * Code(s): ``pH-ISE`` * Unit: Unitless * Sensing: Soil pH Manager (VERIS Technologies, Salinas, USA) on RapidMapper platform, in-situ * Processing: Ordinary Kriging to align sensing- with soil sampling locations * Sampling Date: September 2020 VI - Vegetation Indices * Number of Features: 2 * Code(s): ``NDVI``, ``GNDVI`` * Unit: Unitless * Sensing: Sentinel-2 image during vegetative period (Level-2A) from "Copernicus Open Access Hub" * Processing: Calculating ``NDVI`` as (B08 - B04) / (B08 + B04) and ``GNDVI`` as (B08 - B03) / (B08 + B03), extracting VI values from raster at soil sampling locations * Sampling Date: March 2017 Examples -------- .. code-block:: python from LimeSoDa import load_dataset, split_dataset from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_squared_error import numpy as np # Load and explore the dataset data = load_dataset("RP.62") dataset = data["Dataset"] folds = data["Folds"] coords = data["Coordinates"] # Will be NA for RP.62 # Split into train/test using fold 1 X_train, X_test, y_train, y_test = split_dataset( data=data, fold=1, targets=["pH_target", "SOC_target", "Clay_target"] ) # Fit model and get predictions model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test) # Calculate performance metrics r2 = r2_score(y_test, predictions) rmse = np.sqrt(mean_squared_error(y_test, predictions)) print(f"R-squared: {r2:.7f}") print(f"RMSE: {rmse:.7f}") References ---------- Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.