SSP.460 Dataset¶
Description¶
Target Soil Properties: SOC, pH, Clay
Groups of Features: vis-NIR
Sample Size: 460
Number of Features: 830
Coordinates: Without coordinates because of privacy concerns
Location: State of Sao Paulo, Brazil
Sampling Design: Regular grid sampling
Study Area Size: 473 ha
Geological Setting: Predominantly sandstones with some basaltic flows
Previous Data Publication: None
- Contact Information:
Leonardo Ramirez Lopez (Ramirez-Lopez.L@buchi.com), BÜCHI Labortechnik AG
License: CC BY-SA 4.0
Publication/Modification Date (d/m/y): 28.02.25, version 1.0
- Changelog:
Version 1.0 (28.02.25): Initial release
Details¶
Dataset¶
The dataset contains the following target soil properties and features:
Target Soil Properties:¶
- SOC - Soil Organic Carbon
Code:
SOC_targetUnit: %
Protocol: Measured through titration after oxidization of the organic carbon (Walkley & Black 1934)
Sampling Date: Unknown month 2000
Sampling Depth: 0 - 20 cm
- pH
Code:
pH_targetUnit: Unitless
Protocol: Measured in water suspension with a glass electrode with unspecified liquid:soil ratio
Sampling Date: Unknown month 2001
Sampling Depth: 0 - 20 cm
- Clay
Code:
Clay_targetUnit: %
Protocol: Hydrometer method; separation of the fractions by sieving and sedimentation. Measurement of the separated fractions by weighing the density of the suspension (Gee and Bauder 1979)
Sampling Date: Unknown month 2001
Sampling Depth: 0 - 20 cm
Groups of Features:¶
- vis-NIR – Visible and Near Infrared Spectroscopy
Number of Features: 830
Code(s):
wl_350,wl_352,wl_354…wl_2498Unit: % (Reflectance)
Sensing: Infra-red intelligent spectroradiometer-IRIS MkIV (Geophysical and Environmental Research Corporation, New York, USA), on dried samples in the laboratory, spectral range was 350 – 3,000 nm at 2 - 5 nm intervals
Processing: Discarding noisy edges of the spectrum (2,458 - 3,000 nm)
Sampling Date: Unknown month 2001
- Spectral Information (After Data Processing):
Data Representation: Wavelength (in nm)
Spectral Resolution: 2 - 5 nm depending on wavelength range
Spectral Range: 350 – 2,498 nm
Examples¶
# Load and explore the dataset
data = load_dataset("SSP.460")
dataset = data["Dataset"]
folds = data["Folds"]
coords = data["Coordinates"] # Note: No coordinates available
# Split into train/test using fold 1
X_train, X_test, y_train, y_test = split_dataset(
data=data,
fold=1,
targets=["pH_target", "SOC_target", "Clay_target"]
)
# Calculate model performance
predictions = model.predict(X_test)
metrics = calculate_performance(y_test, predictions)
print(f"R2: {metrics['r2']:.3f}, RMSE: {metrics['rmse']:.3f}")
References¶
Gee, G. W., & Bauder, J. W. (1979). Particle size analysis by hydrometer: a simplified method for routine textural analysis and a sensitivity test of measurement parameters. Soil Science Society of America Journal, 43(5), 1004-1007.
Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.