SP.231 Dataset

Description

  • Target Soil Properties: SOM, pH, Clay

  • Groups of Features: vis-NIR

  • Sample size: 231

  • Number of Features: 271

  • Coordinates: With coordinates (EPSG: 32654)

  • Location: Saitama Prefecture, Japan

  • Sampling Design: Two sampling designs over multiple fields depending on the soil conditions: (1) systematic sampling, in which samples are taken in the corners and middle of the field and (2) simple random sampling

  • Study Area Size: 3.1 ha

  • Geological Setting: Volcanic ash (Andosols)

  • Previous Data Publication: None

  • Contact Information:
  • License: CC BY-SA 4.0

  • Publication/Modification Date (d/m/y): 28.02.25, version 1.0

  • Changelog:
    • Version 1.0 (28.02.25): Initial release

Details

Dataset

The dataset contains the following target soil properties and features:

Target Soil Properties:

SOM - Soil Organic Matter
  • Code: SOM_target

  • Unit: %

  • Protocol: Measured through the weight difference before and after ignition

  • Sampling Date: February 2017, December 2017 and February 2018

  • Sampling Depth: 0 - 15 cm

pH
  • Code: pH_target

  • Unit: Unitless

  • Protocol: Measured in water suspension with a glass electrode with a 2.5:1 liquid:soil gravimetric ratio

  • Sampling Date: February 2017, December 2017 and February 2018

  • Sampling Depth: 0 - 15 cm

Clay
  • Code: Clay_target

  • Unit: %

  • Protocol: Sieve-Pipette method, measured through fractioning the soil into the sand fractions by sieving, and the silt and clay fractions by sedimentation in water (Gee and Bauder 1986)

  • Sampling Date: February 2017, December 2017 and February 2018

  • Sampling Depth: 0 - 15 cm

Groups of Features:

vis-NIR – Visible and Near Infrared Spectroscopy
  • Number of Features: 271

  • Code(s): wl_350, wl_355, wl_360wl_1700

  • Unit: % (Reflectance)

  • Sensing: Mounted vis-NIR spectrometer (SAS3000, Shibuya Seiki Co. Ltd., Ehime Prefecture, Japan), in-situ, spectral range was 320 – 1,700 nm at 1 - 7 nm intervals

  • Processing: Discarding noisy edges of the spectrum (320 - 350 nm), resampling to 5 nm intervals

  • Sampling Date: February 2017, December 2017 and February 2018

  • Spectral Information (After Data Processing):
    • Data Representation: Wavelength (in nm)

    • Spectral Resolution: 1 nm

    • Spectral Range: 350 - 1,700 nm

Examples

from LimeSoDa import load_dataset, split_dataset
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

# Load and explore the dataset
data = load_dataset("SP.231")
dataset = data["Dataset"]
folds = data["Folds"]
coords = data["Coordinates"]

# Split into train/test using fold 1
X_train, X_test, y_train, y_test = split_dataset(
    data=data,
    fold=1,
    targets=["pH_target", "SOM_target", "Clay_target"]
)

# Fit model and get predictions
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Calculate performance metrics
r2 = r2_score(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"R-squared: {r2:.7f}")
print(f"RMSE: {rmse:.7f}")

References

Gee, G.W. & Bauder, J.W. (1986) Particle-Size Analysis. In: Klute, A., Ed., Methods of Soil Analysis, Part 1. Physical and Mineralogical Methods, Agronomy Monograph No. 9, 2nd Edition, American Society of Agronomy/Soil Science Society of America, Madison, WI, 383-411.