PBT Model Accuracy
Out-of-sample backtests of the PBT Model — projections vs. what actually happened, and the game model vs. real outcomes
Batter projections
Beat the Marcel baseline by 5% on full-season accuracy · p10–p90 coverage 74% (2025 backtest, 2,772 preds)
Game model vs results
Brier 0.244 vs home-field base 0.2488 · picks the winner 56% (25,164 games)
Player Accuracy
per 650 PA vs Marcel · 2,772 predsskill 5%p10–p90 cov 74%
| Stat | N | Model | Base | Skill | Cov |
|---|---|---|---|---|---|
| Power | |||||
| Home Runs | 308 | 6.19 | 6.17 | -1% | 76% |
| RBI | 308 | 13.12 | 13.22 | +1% | 73% |
| Doubles | 308 | 5.29 | 5.67 | +7% | 79% |
| Triples | 308 | 1.55 | 1.72 | +10% | 64% |
| Production | |||||
| Runs | 308 | 10.19 | 10.9 | +7% | 78% |
| Hits | 308 | 11.84 | 12.72 | +7% | 81% |
| Stolen Bases | 308 | 5.05 | 5.53 | +9% | 70% |
| Discipline | |||||
| Walks | 308 | 10.06 | 10.27 | +2% | 74% |
| Strikeouts | 308 | 19.11 | 19.42 | +2% | 68% |
Game Model vs Actual Outcomes
backtest
| Season | Games | Brier | Base | Win% | Base% | Run MAE |
|---|---|---|---|---|---|---|
| 2025 | 2,425 | 0.246 | 0.2482 | 54% | 54% | 3.59 |
| 2024 | 2,427 | 0.2452 | 0.2495 | 56% | 52% | 3.43 |
| 2023 | 2,429 | 0.2455 | 0.2496 | 55% | 52% | 3.64 |
| 2022 | 2,424 | 0.2422 | 0.2489 | 58% | 53% | 3.55 |
| 2021 | 2,426 | 0.2445 | 0.2486 | 57% | 54% | 3.59 |
| 2020 | 895 | 0.2433 | 0.2475 | 57% | 55% | 3.55 |
| 2019 | 2,427 | 0.2416 | 0.2491 | 57% | 53% | 3.76 |
| 2018 | 2,428 | 0.241 | 0.2492 | 58% | 53% | 3.65 |
| 2017 | 2,430 | 0.2443 | 0.2484 | 56% | 54% | 3.56 |
| 2016 | 2,427 | 0.2448 | 0.2491 | 55% | 53% | 3.54 |
| 2015 | 2,426 | 0.2453 | 0.2483 | 56% | 54% | 3.42 |
| Overall | 25,164 | 0.244 | 0.2488 | 56% | 53% | 3.57 |
The base-out Monte Carlo sim is scored on its pre-game home win probability for every completed game (each game's real starters + home park). A Brier below the home-field base means the model adds signal beyond "home teams win ~53%." No market column — the free odds feed carries no historical closing lines, so the model is graded against results, never the line. Pitcher projections are tracked separately once the workload model lands.