Saberseminar, Boston
August 5, 2018

Obligatory Disclosure

  • Jonah and Ben are employees of Columbia University, which has received several research grants to develop Stan
  • Jonah and Ben are also managers of GG Statistics LLC, which uses Stan for business purposes
  • According to Columbia University policy, any such employee who has any equity stake in, a title (such as officer or director) with, or is expected to earn at least \(\$5,000\) per year from a private company is required to disclose these facts in presentations


Persistent Fallacies

  • Sabermetricians often point out fallacies like:
  • Announcer's Fallacy: Getting picked off before that homer really cost us a run
  • Announcer's Fallacy II: He's stolen 27 out of 30 bases so he should steal now
  • Unearned Run Fallacy: Official scorer knows how the inning would have played out if the error was not committed
  • Problem is the lack of an explicit and convincing model to query for what would take place if the pickoff, error, etc., had not occurred
  • But current approaches to sabermetrics do not offer this
  • The missing principle is adhering to probability theory to avoid similar errors
  • Don't be results oriented, be expectation oriented

Why Is Baseball Ahead in Sports Analytics?

  • Game has discrete states
  • Availability of public data
  • Early data analysis led to enough acronyms to fill up a t-shirt

Removing the Noise Distorts the Signal

  • Newer acronyms (e.g. FIP) subset to filter noise out of the process
  • Yields more predictable values that are (mis)used for decision-making
  • Keith Law (2017, p.152) claims "[FIP] may indeed throw out the baby with the bathwater"
  • To justify decisions must account for all the signal and average over the noise

Decision Theory Matters to Baseball

  • Baseball is a game of decisions at different levels (players, managers, GMs, …)
    • Rosters, batting orders, substitutions, defensive positioning, pitch sequencing, sending runners from third, etc.
  • Decision theory provides a foundation for rational decision-making that has been successfully applied to many fields
  • Teams should make decisions maximizing expected utility, what you care about
  • Analysts should assume players, managers, and GMs are at least attempting to maximize expected utility when criticizing/analyzing their decisions

Rational Decision-making Needs Expectations

  • The most likely sequence of events for a MLB defense is a perfect game, which has only happened 23 times
  • Making decisions based on what is the most likely to occur is irrational
  • An expectation (denoted by \(\mathbb{E}\) operator) of a function of a random variable has a precise definition that loosely means:
  • weight everything that could happen by its probability and accumulate
  • To use Pythagoren Theorem — we don't — as an expectation requires assuming runs scored/allowed are independent Weibull random variables
  • Few sabermetrics examples do what is rational: explicitly weighting a utility function with a probability distribution
  • Stan yields probability-weighted outputs necessary for decision-making

Adopt the Pokermetrics Mindset

  • Poker community agreed that utility = prize money and made 100% of the analysis be about the expected value of a bet / call / fold
  • Pokermetrics use simulations of opponents to compute expectations and seek optimal strategies
  • Nothing prevents sabermetricians from doing the same with Stan

Stealing Bases: A Small Step in the Right Direction

  • Early game: maximizing expected run difference \(\approx\) maximizing win probability
  • \(0.8 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{ Steal 2B}] + 0.2 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{Caught}] \approx \mathbb{E}[\textrm{Runs} \mid \textrm{Stay at 1B}]\)
  • Conclusion: Only steal if the probability of success is greater than \(0.8\)
  • This sabermetric conclusion has actually led to fewer steal attempts
  • Need more decision-theoretic analysis like this in sabermetrics!
  • But there are several limitations that can be addressed by using Stan

The Way Forward: Generative Modeling

How We Would Calculate \(\mathbb{E}\left[\textrm{Change in Wins}\right]\)

  1. Assemble a group of replacement players who get promoted from and then demoted to the minors during a season, like openWAR does
  2. Use Stan to estimate the posterior distribution of latent abilities of all players conditional on data under a generative model
  3. Repeat many times: Simulate 162 games for the teams' ideal 25-man rosters
  4. For each of the \(25 \times 30\) players in (3), repeat many times:
    • Replace with a randomly-drawn player from (1) who plays same position
    • Resimulate that team's season once
  5. Calculate the average difference in (a function of) wins between (3) and (4)
  • Unlike *WAR, this is founded on probability and specific to the team's context
  • Unlike *WAR, replacing \(A\) affects performance of player \(B\), \(C\), etc.
  • Unlike *WAR, this isn't disorted by injuries, bereavement, suspensions, etc.

Simplified Generative Model for Base Stealing

Generative Models, Bayesian Estimates use Stan

  • Stan is a high-level computer language for utilizing probability distributions. Overlaps with R, Python, etc. but Stan is more focused. (
  • Large community of developers that is trusted and in demand:
  • Academics and industry analysts across subfields who are serious about modeling phenomena tend to use Stan because that is what Stan is intended to do and has the most advanced algorithms for doing so. E.g.:

Should Cain Steal Second with Two Outs vs. NYY?

Given 2017 data, we believe he should steal off Pineda but not Sabathia

Should Managers Steal Second More with 2 Out?

With caveats, managers seem a bit too conservative with two outs

Advancing the Stolen Base Question with Stan

  • "Rule": Only steal 2B if the probability of success is at least \(0.8\) but

    • Conclusion based on maximizing expected runs that inning, not utility
    • Expected runs for each game state is assumed to be the same regardless of runner, pitcher catcher, batter, on deck, etc.
    • Runners who are fast have a higher probability of scoring from second (or first) than the average runner
  • For the sake of presentation simplicity, we do not tackle any of those issues, and base-stealing opportunities are limited to runner on 1B only with 2 out (but not a full count)

  • How do you know what the probability of a successful steal is in this situation? Selection effects make this and similar problems difficult, unless you estimate generative models using a tool like Stan.

Key Part of the model Block of a Stan Program

// intermediate variables (indexing via [] works the same as in R) alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]) utility = (E_runs_2B * (1 - Pr_out) - E_runs_1B) / scale Pr_attempt = inv_logit(omega_0 + omega_1 * utility) // conditional probability of the observables attempts ~ binomial(opportunities, Pr_attempt) caught ~ beta_binomial(attempts, alpha_defense, beta_runner[runners]) // prior distributions for the primitive unknowns rho[1] ~ exponential(1) rho[2] ~ pareto(rho[1], 2) alpha_pitcher ~ dirichlet(rho[{2,1}][p_throws]) ...

Easing into Stan and Generative Modeling

A few R packages such as rstanarm and brms map familiar R model-fitting syntax:

y ~ x + (1 + x | g), data = dataset, family = binomial()

prefixed with rstanarm::stan_glmer or brms::brm instead of lme4::glmer. This enables new users to take advantage of Stan without having to learn Stan's language and idioms, but has limited choices for priors, functional forms, and multivariate statistics.


  1. Heuristics are not an adequate substitute for maximizing expected utility
  2. Expectations presupose you are working with probability distributions
  3. Sabermetrics, unlike pokermetrics, has not been doing this
  4. But you can do it in Stan if you specify and justify your generative model
  • Thanks to for inviting us to speak

Appendix: Stan Code for Stolen Base Model

data Block

data {
  // sizes
  int<lower=1> obs;        // number of observations with 2 outs and runner on first only
  int<lower=1> N_runners;  // number of runners
  int<lower=1> N_pitchers; // number of pitchers
  int<lower=1> N_catchers; // number of catchers
  // ID variables (like factors in R but coded as consecutive integers)
  int<lower=1,upper=N_runners> runners[obs];
  int<lower=1,upper=N_pitchers> pitchers[obs];
  int<lower=1,upper=N_catchers> catchers[obs];
  // known inputs
  vector<lower=0>[N_runners] inv_top_speed;  // reciprocal of top speed / 30 FPS
  int<lower=1,upper=2> p_throws[N_pitchers]; // indicator of pitcher handedness
  vector<lower=0>[N_catchers] time2B;        // time of ball to get to 2B
  // counts
  int<lower=0> attempts[obs];
  int<lower=1> opportunities[obs];
  int<lower=0> caught[obs];

transformed data and parameters Blocks

transformed data {
  // Link?
  real E_runs_2B = 0.3298; // expected runs | 2 out, runner on 2B only
  real E_runs_1B = 0.2349; // expected runs | 2 out, runner on 1B only
  real scale = fabs(E_runs_2B - 2 * E_runs_1B);
parameters {
  vector<lower=0>[N_runners] beta_runner;     // base-stealing ability
  simplex[N_pitchers] alpha_pitcher;          // holding runner ability and time to plate
  vector<lower=0>[N_catchers] alpha_catcher;  // caught-stealing ability

  positive_ordered[2] rho;   // sensitivity to pitcher
  real<lower=0> gamma;       // sensitivity to catcher
  real omega_0;              // intercept for strategy
  real<lower=0> omega_1;     // sensitivity to strategy

model Block

model {
  vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers];
  vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
  vector[obs] utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;
  vector[obs] Pr_attempt = inv_logit(omega_0 + omega_1 * utility);

  // likelihood
  target += binomial_lpmf(attempts | opportunities, Pr_attempt); // selection
  target += beta_binomial_lpmf(caught | attempts, alpha_defense, beta_runner[runners]);
  // priors
  target += exponential_lpdf(rho[1] | 1);
  target += pareto_lpdf(rho[2] | rho[1], 2);
  target += dirichlet_lpdf(alpha_pitcher   | rho[{2,1}][p_throws]);
  target += exponential_lpdf(gamma | 1);
  target += normal_lpdf(omega_0 | 0, 1);
  target += exponential_lpdf(omega_1 | 1);
  target += exponential_lpdf(beta_runner   | inv_top_speed);
  target += exponential_lpdf(alpha_catcher | gamma * time2B);

generated quantities Block

generated quantities {
  vector[obs] utility;
    vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .*
    vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
    utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;