SSP.460 Dataset ============== Description ----------- * Target Soil Properties: SOC, pH, Clay * Groups of Features: vis-NIR * Sample Size: 460 * Number of Features: 830 * Coordinates: Without coordinates because of privacy concerns * Location: State of Sao Paulo, Brazil * Sampling Design: Regular grid sampling * Study Area Size: 473 ha * Geological Setting: Predominantly sandstones with some basaltic flows * Previous Data Publication: None * Contact Information: * Leonardo Ramirez Lopez (Ramirez-Lopez.L@buchi.com), BÜCHI Labortechnik AG * License: CC BY-SA 4.0 * Publication/Modification Date (d/m/y): 28.02.25, version 1.0 * Changelog: * Version 1.0 (28.02.25): Initial release Details ------- Dataset ^^^^^^^ The dataset contains the following target soil properties and features: Target Soil Properties: """""""""""""""""""""" SOC - Soil Organic Carbon * Code: ``SOC_target`` * Unit: % * Protocol: Measured through titration after oxidization of the organic carbon (Walkley & Black 1934) * Sampling Date: Unknown month 2000 * Sampling Depth: 0 - 20 cm pH * Code: ``pH_target`` * Unit: Unitless * Protocol: Measured in water suspension with a glass electrode with unspecified liquid:soil ratio * Sampling Date: Unknown month 2001 * Sampling Depth: 0 - 20 cm Clay * Code: ``Clay_target`` * Unit: % * Protocol: Hydrometer method; separation of the fractions by sieving and sedimentation. Measurement of the separated fractions by weighing the density of the suspension (Gee and Bauder 1979) * Sampling Date: Unknown month 2001 * Sampling Depth: 0 - 20 cm Groups of Features: """"""""""""""""" vis-NIR – Visible and Near Infrared Spectroscopy * Number of Features: 830 * Code(s): ``wl_350``, ``wl_352``, ``wl_354`` ... ``wl_2498`` * Unit: % (Reflectance) * Sensing: Infra-red intelligent spectroradiometer-IRIS MkIV (Geophysical and Environmental Research Corporation, New York, USA), on dried samples in the laboratory, spectral range was 350 – 3,000 nm at 2 - 5 nm intervals * Processing: Discarding noisy edges of the spectrum (2,458 - 3,000 nm) * Sampling Date: Unknown month 2001 * Spectral Information (After Data Processing): * Data Representation: Wavelength (in nm) * Spectral Resolution: 2 - 5 nm depending on wavelength range * Spectral Range: 350 – 2,498 nm Examples -------- .. code-block:: python # Load and explore the dataset data = load_dataset("SSP.460") dataset = data["Dataset"] folds = data["Folds"] coords = data["Coordinates"] # Note: No coordinates available # Split into train/test using fold 1 X_train, X_test, y_train, y_test = split_dataset( data=data, fold=1, targets=["pH_target", "SOC_target", "Clay_target"] ) # Calculate model performance predictions = model.predict(X_test) metrics = calculate_performance(y_test, predictions) print(f"R2: {metrics['r2']:.3f}, RMSE: {metrics['rmse']:.3f}") References ---------- Gee, G. W., & Bauder, J. W. (1979). Particle size analysis by hydrometer: a simplified method for routine textural analysis and a sensitivity test of measurement parameters. Soil Science Society of America Journal, 43(5), 1004-1007. Walkley, A. & Black, I. A. (1934). An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil science, 37(1), 29-38.