NRW.42 Dataset ============= Description ----------- * Target Soil Properties: SOC, pH, Clay * Groups of Features: MIR * Sample size: 42 * Number of Features: 1,686 * Coordinates: Without coordinates because of privacy concerns * Location: North Rhine-Westphalia, Germany * Sampling Design: Regular grid sampling * Study Area Size: 1.5 ha * Geological Setting: Pleistocene periglacial slope deposits consisting of weathered Devonian sand-, silt-, and claystones, partially covered by Weichselian loess * Previous Data Publication: None * Contact Information: * Stefan Paetzold (s.paetzold@uni-bonn.de), University of Bonn * License: CC BY-SA 4.0 * Publication/Modification Date (d/m/y): 28.02.25, version 1.0 * Changelog: * Version 1.0 (28.02.25): Initial release Details ------- Dataset ^^^^^^^ The dataset contains the following target soil properties and features: Target Soil Properties: """""""""""""""""""""" SOC - Soil Organic Carbon * Code: ``SOC_target`` * Unit: % * Protocol: Determined by the difference of total carbon and inorganic carbon, where total carbon was obtained through elemental analysis by measuring the CO₂ release during dry combustion (DIN ISO 10694) without acid pretreatment and inorganic carbon as 0.12 x the calcium carbonate content, determined by the gas-volumetric Scheibler Method (ISO 10693) * Sampling Date: April 2013 * Sampling Depth: 0 - 30 cm pH * Code: ``pH_target`` * Unit: Unitless * Protocol: Measured in CaCl₂ suspension with a glass electrode with a 5:1 liquid:soil volumetric ratio (DIN ISO 10390) * Sampling Date: April 2013 * Sampling Depth: 0 - 30 cm Clay * Code: ``Clay_target`` * Unit: % * Protocol: Sieve-Pipette method, measured through fractioning the soil into the sand fractions by sieving, and the silt and clay fractions by sedimentation in water, German adaptation (DIN ISO 11277) * Sampling Date: April 2013 * Sampling Depth: 0 - 30 cm Groups of Features: """"""""""""""""" MIR – Mid Infrared Spectroscopy * Number of Features: 1,686 * Code(s): ``wn_3799``, ``wn_3797.1``, ``wn_3795.1`` ... ``wn_549.6`` * Unit: % (Reflectance) * Sensing: MIR spectrometer (Bruker Tensor 27 HTS-XT, Bruker Optik, Ettlingen, Germany), on dried and sieved samples (<2 mm) in the laboratory, spectral range was 7,500 – 550 cm⁻¹ at 4 cm⁻¹ intervals * Processing: Discarding irrelevant spectral data of the spectrum (7,500 - 3,799 cm⁻¹), resampling to ~2 cm⁻¹ intervals * Sampling Date: April 2013 * Spectral Information (After Data Processing): * Data Representation: Wavenumber (in cm⁻¹) * Spectral Resolution: ~2 cm⁻¹ * Spectral Range: 3,799 - 549.6 cm⁻¹ Examples -------- .. code-block:: python from LimeSoDa import load_dataset, split_dataset from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_squared_error import numpy as np # Load and explore the dataset data = load_dataset("NRW.42") dataset = data["Dataset"] folds = data["Folds"] coords = data["Coordinates"] # Will be NA for NRW.42 # Split into train/test using fold 1 X_train, X_test, y_train, y_test = split_dataset( data=data, fold=1, targets=["pH_target", "SOC_target", "Clay_target"] ) # Fit model and get predictions model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test) # Calculate performance metrics r2 = r2_score(y_test, predictions) rmse = np.sqrt(mean_squared_error(y_test, predictions)) print(f"R-squared: {r2:.7f}") print(f"RMSE: {rmse:.7f}") References ---------- Gee, G.W. & Bauder, J.W. (1986) Particle-Size Analysis. In: Klute, A., Ed., Methods of Soil Analysis, Part 1. Physical and Mineralogical Methods, Agronomy Monograph No. 9, 2nd Edition, American Society of Agronomy/Soil Science Society of America, Madison, WI, 383-411. Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.