SSP.460 Dataset

Description

  • Target Soil Properties: SOC, pH, Clay

  • Groups of Features: vis-NIR

  • Sample Size: 460

  • Number of Features: 830

  • Coordinates: Without coordinates because of privacy concerns

  • Location: State of Sao Paulo, Brazil

  • Sampling Design: Regular grid sampling

  • Study Area Size: 473 ha

  • Geological Setting: Predominantly sandstones with some basaltic flows

  • Previous Data Publication: None

  • Contact Information:
  • License: CC BY-SA 4.0

  • Publication/Modification Date (d/m/y): 28.02.25, version 1.0

  • Changelog:
    • Version 1.0 (28.02.25): Initial release

Details

Dataset

The dataset contains the following target soil properties and features:

Target Soil Properties:

SOC - Soil Organic Carbon
  • Code: SOC_target

  • Unit: %

  • Protocol: Measured through titration after oxidization of the organic carbon (Walkley & Black 1934)

  • Sampling Date: Unknown month 2000

  • Sampling Depth: 0 - 20 cm

pH
  • Code: pH_target

  • Unit: Unitless

  • Protocol: Measured in water suspension with a glass electrode with unspecified liquid:soil ratio

  • Sampling Date: Unknown month 2001

  • Sampling Depth: 0 - 20 cm

Clay
  • Code: Clay_target

  • Unit: %

  • Protocol: Hydrometer method; separation of the fractions by sieving and sedimentation. Measurement of the separated fractions by weighing the density of the suspension (Gee and Bauder 1979)

  • Sampling Date: Unknown month 2001

  • Sampling Depth: 0 - 20 cm

Groups of Features:

vis-NIR – Visible and Near Infrared Spectroscopy
  • Number of Features: 830

  • Code(s): wl_350, wl_352, wl_354wl_2498

  • Unit: % (Reflectance)

  • Sensing: Infra-red intelligent spectroradiometer-IRIS MkIV (Geophysical and Environmental Research Corporation, New York, USA), on dried samples in the laboratory, spectral range was 350 – 3,000 nm at 2 - 5 nm intervals

  • Processing: Discarding noisy edges of the spectrum (2,458 - 3,000 nm)

  • Sampling Date: Unknown month 2001

  • Spectral Information (After Data Processing):
    • Data Representation: Wavelength (in nm)

    • Spectral Resolution: 2 - 5 nm depending on wavelength range

    • Spectral Range: 350 – 2,498 nm

Examples

# Load and explore the dataset
data = load_dataset("SSP.460")
dataset = data["Dataset"]
folds = data["Folds"]
coords = data["Coordinates"]  # Note: No coordinates available

# Split into train/test using fold 1
X_train, X_test, y_train, y_test = split_dataset(
    data=data,
    fold=1,
    targets=["pH_target", "SOC_target", "Clay_target"]
)

# Calculate model performance
predictions = model.predict(X_test)
metrics = calculate_performance(y_test, predictions)
print(f"R2: {metrics['r2']:.3f}, RMSE: {metrics['rmse']:.3f}")

References

Gee, G. W., & Bauder, J. W. (1979). Particle size analysis by hydrometer: a simplified method for routine textural analysis and a sensitivity test of measurement parameters. Soil Science Society of America Journal, 43(5), 1004-1007.

Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.