Common pitfalls in evaluating model performance and strategies for avoidance in agricultural studies

Cover
This study identifies three critical pitfalls in predictive modeling, including improper cross-validation estimators, data leakage during model selection, and neglecting experimental block effects. These issues significantly compromise the reliability of model evaluations in agricultural research. Furthermore, it provides a comprehensive analysis of how various performance metrics behave under different data distributions, offering guidelines for selecting appropriate indicators for both regression (addressing bias and variance) and classification (addressing class imbalance) tasks
Benchmarking Biases via Simulated and Real Multi-Season Datasets

Benchmarking Biases via Simulated and Real Multi-Season Datasets

To rigorously quantify model estimation biases in a global setting, the study utilized both a simulated spectral dataset with controlled latent variables and a real-world forage quality dataset collected across multiple seasons. It illustrates the structure of the real-world dataset, characterized by complex seasonal variations and autocorrelation, which served as the foundation for benchmarking how well different evaluation strategies handle environmental block effects.

The Impact of Splitting Strategies in Model Selection

The Impact of Splitting Strategies in Model Selection

The study demonstrates how improper splitting strategies during model selection, such as performing feature selection or hyperparameter tuning on the entire dataset rather than within training folds, lead to significant data leakage and inflated performance estimates. This figure visualizes these distinct workflows (e.g., separating training, validation, and test sets versus mixing them), highlighting the necessity of isolating the model selection process from the final performance evaluation to ensure validity.

Block Cross-Validation vs. Random Splitting

Block Cross-Validation vs. Random Splitting

Ignoring experimental structures, such as seasonal variations or herd differences, causes standard Random Cross-Validation (CV) to overestimate model performance by failing to account for block effects. The figure contrasts these approaches, showing how Block CV respects experimental boundaries (keeping samples from the same block together) to provide a more realistic estimate of a model's generalizability to new, unseen environments.