✨ TL;DR
IDOBE is a standardized benchmark dataset containing over 10,000 infectious disease outbreak segments from a century of surveillance data across 13 diseases, designed to evaluate and compare epidemic forecasting methods. The authors test 11 baseline models and find MLP-based methods perform most robustly, with statistical methods excelling in pre-peak phases.
Epidemic forecasting has become critical for outbreak response, with collaborative ensembles of statistical and machine learning models now standard practice. However, the field lacks standardized benchmark datasets for rigorous evaluation of these methods. Additionally, there is limited understanding of how well these forecasting approaches perform for novel outbreaks where historical data is scarce. This makes it difficult to systematically compare methods and understand their strengths and weaknesses across different outbreak scenarios and disease contexts.
The authors created IDOBE by compiling epidemiological time series from multiple data repositories spanning over a century of surveillance across U.S. states and global locations. They applied derivative-based segmentation to extract individual outbreak episodes, generating over 10,000 outbreak segments covering multiple outcomes (cases, hospitalizations) for 13 different diseases. They characterized the dataset's epidemiological diversity using information-theoretic and distributional measures. For benchmarking, they implemented 11 baseline forecasting models and evaluated multi-horizon short-term forecasts (1- to 4-week-ahead) throughout outbreak progression using both standard metrics (NMSE, MAPE) and probabilistic scoring rules (Normalized Weighted Interval Score).
What the paper shows.
The benchmark evaluation of 11 baseline models across the IDOBE dataset revealed that MLP-based methods achieved the most robust performance overall across different forecast horizons and outbreak phases. Statistical methods demonstrated a slight edge specifically during the pre-peak phase of outbreaks. Performance was quantified using multiple metrics including NMSE and MAPE for point forecasts, and Normalized Weighted Interval Score (NWIS) for probabilistic forecasts. The dataset successfully captures epidemiological diversity across 13 diseases with varying outbreak characteristics, as confirmed through information-theoretic and distributional measures.
The paper does not explicitly discuss limitations in detail. Potential implicit limitations include the focus on short-term forecasting (1-4 weeks ahead) which may not capture longer-term outbreak dynamics, and the reliance on derivative-based segmentation which may introduce artifacts in outbreak boundary detection. The benchmark is limited to the 11 baseline models tested, and performance on truly novel pathogens with no historical analogs remains uncertain. The dataset's geographic coverage, while spanning U.S. states and some global locations, may not represent all outbreak contexts globally.
✨ Generated by Claude · Apr 21, 2026 · Read the PDF for authoritative content.