No One’s Replaceable: A Joint Model for Peace Amidst WAR
A Bayesian approach to jointly estimating batting, pitching, fielding, and baserunning contributions.
1 Introduction
Another paper recently published with claims of an improved method of modeling baseball player Wins Above Replacement (WAR) — Brill and Wyner (2024), offers us additional perspective. Contributions to our store of knowledge are always appreciated — thank you, Brill and Wyner! Teams and the public alike have long sought a single performance value to assign to players, leading to various implementations of WAR.
Popular news sites like Fangraphs and Baseball Reference have their own versions of WAR, and these differ significantly. Brill and Wyner contrast their approach to that of these sites while continuing to use machine learning to estimate various components of their models, after which they point-estimate them together. Interestingly, Brill and Wyner cite Baumer, Jensen, and Matthews (2015) as a source for further details on how the Fangraphs and Baseball Reference models work, but do not discuss Baumer’s key contributions — namely, modeling runs at the plate appearance level. Yet Baumer, too, employs separate linear regressions and then aggregates those estimates into an overall WAR model.
While these approaches offer some value, I prefer to model all possible contributions by players and estimate them directly in natural units of runs scored (or allowed), without subtracting out a nebulous “replacement player” value. Which is who, by the way? Let me explain my preferences.
In my view, a full posterior distribution of each player’s expected run contributions, whether averaged by opportunities to contribute or summed over a season, is far more useful. The better approach is to jointly model all relevant components, which conserves the information within variation and properly propagates uncertainty throughout the model. Failing to do so, as the current machine learning or point-estimate methods tend to, throws away valuable information and introduces bias when those point estimates are used as inputs to other models.
Additionally, I don’t believe it’s as useful to subtract out the contributions of a replacement player. Each WAR model defines “replacement level” in a different way — Fangraphs differs from Baseball Reference, which differs from openWAR and GridWAR. Moreover, managers have specific, real players in mind when they make decisions about replacements, often targeting those available for trade. In comparing players, managers must compare specific values directly anyway. Further, replacement-level metrics can change year to year, and comparing players using straight expectations of their runs per opportunity or per season provides a cleaner method of player evaluation.
In this paper, I propose a joint model for estimating expected runs based on batting, pitching, fielding, and baserunning contributions. By jointly modeling all components, I build on Baumer, Jensen, and Matthews (2015)’s idea but avoid the problems that arise from modeling each contribution separately. To accomplish this, I use the probabilistic programming language Stan, a robust framework for Bayesian modeling1. The posterior distributions obtained from this model, reflecting the contributions of all players, offer a thorough and direct measure of player impact over the course of a season.
From this proof-in-concept, the resulting posterior distributions of all players’ distributions in their contributions for this 2024 season looks like this:
With each player’s distributions of expected contributions, we can summarize or work with them to answer any number of questions. Let’s review the model below.
2 Model Overview
This Bayesian model jointly estimates expected runs by incorporating player contributions from batting, pitching, fielding, and baserunning, as well as park effects and baserunner advancements. In contrast to openWAR and other WAR models, which handle these components independently or in sequential steps, this approach jointly models all components in a single, coherent framework. By doing so, uncertainty propagates across components more effectively, leading to more robust and accurate estimates.
2.1 Notation and Definitions
We define several key quantities to structure the model. Let \(\rho_{o,b}\) represent the expected number of runs given a specific game state, where \(o\) is the number of outs and \(b\) is the base configuration. Player contributions are modeled through run value parameters, denoted as \(\beta_p\), for batter, pitcher, fielder, and baserunner \(p\). The park adjustment factor for game \(i\) is \(\beta_{\text{venue}[i]}\), which accounts for different scoring environments across venues. The variability in run outcomes is represented by \(\sigma\).
During a plate appearance \(i\), the change in expected runs is represented as \(\Delta \rho_i\), which captures the effect of the game state change on expected runs. This is combined with the actual runs scored during the appearance, \(r_i\), and the runs scored after the appearance but within the half-inning, \(r_{\text{to end}, i}\). Finally, \(\alpha_{ij}\) represents the expected baserunner advancement for runner \(j\) during plate appearance \(i\).
2.2 Prior Distributions
Priors are assigned to the key model parameters to capture the uncertainty in player contributions, park effects, and baserunner advancements. These priors allow the model to be flexible and adapt to the data:
\[ \begin{aligned} \rho_{o,b} &\sim \mathcal{N}(0, 1) \\ \alpha_{ij} &\sim \mathcal{N}(0, 1) \\ \beta_{\text{batter}[p]}, \beta_{\text{pitcher}[p]}, \beta_{\text{fielder}[p]}, \beta_{\text{baserunner}[p]} &\sim \mathcal{N}(0, 0.3) \\ \beta_{\text{venue}} &\sim \mathcal{N}(0, 1) \\ \quad \sum \beta_{\text{venue}} &\sim \mathcal{N}(0, \frac{1}{\sqrt{n_{\text{venues}}}}) \\ \boldsymbol{\beta_{\text{bf}}} &\sim \mathcal{N}(0,1) \\ \sigma &\sim \text{Exponential}(1) \end{aligned} \]
These priors reflect the initial uncertainty in player contributions and park effects, while ensuring that the model remains flexible in accounting for variations in different game environments.
2.3 Game State and Expected Runs
The expected runs for each game state, determined by the number of outs and base runners, are represented by the matrix \(\rho\). The change in expected runs \(\Delta \rho_i\) during plate appearance \(i\) is calculated as:
\[ \Delta \rho_i = \rho_{\text{end}} - \rho_{\text{start}} \]
where \(\rho_{\text{start}}\) represents the expected runs given the game state before the plate appearance, and \(\rho_{\text{end}}\) is the expected runs after the plate appearance. If the plate appearance ends the half-inning, then \(\rho_{\text{end}} = 0\).
2.4 Total Run Value
The total run value for each plate appearance \(i\), denoted \(r_{\text{total}, i}\), is computed by summing the actual runs scored, the change in expected runs, the baserunner advancements, and the park adjustment factor:
\[ r_{\text{total}, i} = r_i + \Delta \rho_i + \sum_j \left( \text{adv}_{ij} \cdot \beta_{\text{baserunner}} \right) + \mathbb{I}_{\text{in play}}\beta_{\text{venue}[i]} \]
In this equation, \(r_i\) is the actual runs scored during the plate appearance, \(\Delta \rho_i\) is the change in expected runs, \(\text{adv}_{ij}\) is the advancement for each runner \(j\), and \(\beta_{\text{venue}[i]}\) captures the park effect for game \(i\).
2.5 Likelihood
The model’s likelihood functions are structured in two components: one for the run value of the plate appearance, and another for the expected runs scored after the plate appearance but within the half-inning. When the ball is put in play, the likelihood for plate appearance \(i\) is given by:
\[ \mathcal{L}\left[ p(r_{\text{total}, i} | \mu_i, \sigma) \right] = \mathcal{L}\left[ \mathcal{N}(r_{\text{total}, i} | \mu_i, \sigma) \right] \]
where the expected run value \(\mu_i\) is:
\[ \mu_i = \beta_{\text{batter}[i]} - f_{\text{out}} \cdot \beta_{\text{fielder}[i]} - (1 - f_{\text{out}}) \cdot \beta_{\text{pitcher}[i]} \] where
\[ f_{\text{out}} = \text{logit}\left(\bf{B}_x(x) \otimes \bf{B}_y(y) \cdot \boldsymbol{\beta_{\text{bf}}}\right) \]
For expected runs at all game states, the log-likelihood is:
\[ \mathcal{l}\left[p(r_{\text{to end}, i} + r_i | \rho_{\text{start}}, \sigma)\right] =\log \left( \mathcal{L}\left[ \mathcal{N}(r_{\text{to end}, i} + r_i | \rho_{\text{start}}, \sigma)\right] \right) \]
If the ball is not put in play, the likelihood reduces to:
\[ \mathcal{L}\left[p(r_{\text{total}, i} | \mu_i, \sigma)\right] = \mathcal{L}\left[ \mathcal{N}(r_{\text{total}, i} | \beta_{\text{batter}[i]} - \beta_{\text{pitcher}[i]}, \sigma)\right] \]
In other words, the batter and pitcher have full responsibility unless the ball is in play.
2.6 Baserunner Advancement
For each baserunner \(j\), we calculate the advancement beyond expectation as:
\[ \text{adv}_j = \text{actual\_adv}_j - \alpha_{ij} \]
Here, \(\text{actual\_adv}_j\) represents the base the runner ends up on, and \(\alpha_{ij}\) represents the expected base for the event type outcomes of each plate appearance2.
2.7 Total Log-Likelihood
The total log-likelihood across all plate appearances is:
\[ \sum_{i=1}^{N} \left[ \mathcal{l} p(r_{\text{total}, i} | \mu_i, \sigma) + \mathcal{l} p(r_{\text{to end}, i} | \rho_{\text{start}}, \sigma) \right] \]
3 Comparison to openWAR and Other WAR Models
This model provides several improvements over previous WAR models. The most notable improvement over openWAR is the joint modeling of all player contributions. In openWAR, each component (batter, pitcher, fielder, baserunner, etc.) is modeled independently. This fragmentation can lead to biased estimates due to incomplete propagation of uncertainty. By jointly estimating all components in this Bayesian framework, uncertainty propagates effectively across all parameters, resulting in more coherent estimates of player contributions.
Additionally, this model integrates park factors directly into the estimation process, offering a more detailed understanding of how venue impacts run-scoring. Other WAR models, such as Fangraphs WAR and Baseball Reference WAR, often apply park adjustments post hoc, which can lead to less accurate player-level estimates. This model’s unified approach to incorporating park effects makes it more robust.
GridWAR, another WAR model, also seeks to model expected runs per plate appearance but relies on machine learning techniques. While machine learning has its place, the lack of direct probabilistic modeling can lead to suboptimal uncertainty handling. This model, by comparison, fully embraces the Bayesian framework and allows for better uncertainty quantification in all components.
Moreover, baserunner advancement is handled in a more holistic manner here. Rather than treating advancements as a separate model component, this model integrates them into the overall framework, improving the interpretability and accuracy of baserunner contributions to run-scoring.
4 Next steps
I hope this provides a proof-in-concept for others to begin modeling all aspects of player contributions jointly. Once developed, these model can accommodate — or allow for — unlimited extensions within its framework, including age-related player changes to modeling nuances within positions and more.
References
Footnotes
Wall time for fitting the model isn’t an issue. A season of data takes about an hour to fit on an Apple Mac Studio M2 Ultra without intensive optimization. With the amount of data needed to meaningfully affect estimates, there’s no need to refit the model after every game or week.↩︎
This also follows openWAR’s general approach. A more advanced model could account for hitting dynamics and trajectory, actual fielder positions before the hit, and other information.↩︎