We've recalibrated BaseChaser's simulation engine to better reflect what's actually happening on the field — not just what was projected in spring training.
Every model is a set of assumptions. As the season progresses and real data accumulates, some of those assumptions deserve a second look. After a thorough review of our simulation output, we've made four targeted changes to improve how BaseChaser generates playoff probabilities.
None of these changes affect the core Monte Carlo engine — we still run 100,000 simulations per update, the playoff structure logic is unchanged, and the same game-by-game simulation pipeline remains in place. What changed is how we estimate team strength going into those simulations.
BaseChaser blends two signals to estimate team strength: an Elo rating built from game results and a talent prior derived from FanGraphs' projected WAR. The WAR projection is especially useful early in the season when a 10-game sample tells you less than a full roster's depth chart. But projections get stale. Trades happen, players get injured, rookies arrive. Our previous WAR data was nearly a month old — an eternity in baseball. Updating to current FanGraphs projections ensures the talent prior reflects May's reality, not April's.
Our original weight schedule leaned heavily on WAR projections through the first 80 games of the season. At 47 games played, the model was still giving 65% of the blended Elo to the WAR prior and only 35% to actual results. That's defensible in April when you have 15 games of data. By mid-May, with nearly 50 games in the books, teams have told you a lot about who they are. The new schedule shifts to 40% WAR at this stage, and continues decreasing more aggressively as the season unfolds.
The new WAR weight schedule: 85% at ≤20 GP → 65% at ≤40 GP → 40% at ≤60 GP → 25% at ≤80 GP → 15% at ≤120 GP → 5% after 120 GP. By September, the model almost entirely trusts game results.
Two changes work together here. First, we increased the Elo K-factor from 2 to 4. K-factor controls how much each game moves a team's rating — at K=2, even a dominant stretch of 10 straight wins barely moved the needle. At K=4 (the same value FiveThirtyEight used for MLB), hot streaks and cold streaks register faster in the model. A team that's genuinely figuring it out mid-season will see that reflected in their projected odds sooner.
Second, we widened the conversion from WAR projections to Elo points. The old scaling compressed the talent gap between the best and worst MLB teams into a narrow band where individual game simulations were barely distinguishable from coin flips. The wider scaling means the model's opinion about who's better actually shows up in simulated game outcomes.
You may notice some shifts in today's numbers compared to yesterday's. That's expected — the model is now weighting recent performance more heavily and using current roster projections instead of month-old ones. Here's the directional impact:
| Scenario | Effect |
|---|---|
| Team with a strong record | Odds increase — game results count for more |
| Team outperforming their WAR projection | Odds increase — less drag from a skeptical prior |
| Team with a weak record but high WAR | Odds decrease — can't coast on projections as long |
| Team underperforming their WAR projection | Odds decrease — the prior protects them less |
The model still believes in talent — it just believes in results more than it did yesterday. That's the right tradeoff at this point in the season.
We take model transparency seriously. If you want the full technical details, our methodology page has the complete breakdown of how simulations work. As always, the raw data powering the odds is available as a public JSON endpoint.
Better inputs make better odds. We'll continue reviewing and refining the model throughout the season to make sure BaseChaser gives you the sharpest picture of the playoff race.