Stan Is Not an Acronym

Saberseminar, Boston
August 5, 2018

Obligatory Disclosure

Jonah and Ben are employees of Columbia University, which has received several research grants to develop Stan
Jonah and Ben are also managers of GG Statistics LLC, which uses Stan for business purposes
According to Columbia University policy, any such employee who has any equity stake in, a title (such as officer or director) with, or is expected to earn at least $\$5,000$ per year from a private company is required to disclose these facts in presentations

Introduction

Persistent Fallacies

Sabermetricians often point out fallacies like:
Announcer's Fallacy: Getting picked off before that homer really cost us a run
Announcer's Fallacy II: He's stolen 27 out of 30 bases so he should steal now
Unearned Run Fallacy: Official scorer knows how the inning would have played out if the error was not committed
Problem is the lack of an explicit and convincing model to query for what would take place if the pickoff, error, etc., had not occurred
But current approaches to sabermetrics do not offer this
The missing principle is adhering to probability theory to avoid similar errors
Don't be results oriented, be expectation oriented

Why Is Baseball Ahead in Sports Analytics?

Game has discrete states
Availability of public data
Early data analysis led to enough acronyms to fill up a t-shirt

Removing the Noise Distorts the Signal

Newer acronyms (e.g. FIP) subset to filter noise out of the process

Yields more predictable values that are (mis)used for decision-making

Keith Law (2017, p.152) claims "[FIP] may indeed throw out the baby with the bathwater"

To justify decisions must account for all the signal and average over the noise

Decision Theory Matters to Baseball

Baseball is a game of decisions at different levels (players, managers, GMs, …)
- Rosters, batting orders, substitutions, defensive positioning, pitch sequencing, sending runners from third, etc.

Decision theory provides a foundation for rational decision-making that has been successfully applied to many fields
Teams should make decisions maximizing expected utility, what you care about
Analysts should assume players, managers, and GMs are at least attempting to maximize expected utility when criticizing/analyzing their decisions

Rational Decision-making Needs Expectations

The most likely sequence of events for a MLB defense is a perfect game, which has only happened 23 times

Making decisions based on what is the most likely to occur is irrational

An expectation (denoted by $\mathbb{E}$ operator) of a function of a random variable has a precise definition that loosely means:

weight everything that could happen by its probability and accumulate

To use Pythagoren Theorem — we don't — as an expectation requires assuming runs scored/allowed are independent Weibull random variables

Few sabermetrics examples do what is rational: explicitly weighting a utility function with a probability distribution

Stan yields probability-weighted outputs necessary for decision-making

Adopt the Pokermetrics Mindset

Poker community agreed that utility = prize money and made 100% of the analysis be about the expected value of a bet / call / fold

Pokermetrics use simulations of opponents to compute expectations and seek optimal strategies

Nothing prevents sabermetricians from doing the same with Stan

Stealing Bases: A Small Step in the Right Direction

Early game: maximizing expected run difference $\approx$ maximizing win probability
$0.8 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{ Steal 2B}] + 0.2 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{Caught}] \approx \mathbb{E}[\textrm{Runs} \mid \textrm{Stay at 1B}]$
Conclusion: Only steal if the probability of success is greater than $0.8$
This sabermetric conclusion has actually led to fewer steal attempts
Need more decision-theoretic analysis like this in sabermetrics!
But there are several limitations that can be addressed by using Stan

The Way Forward: Generative Modeling

How We Would Calculate $\mathbb{E}\left[\textrm{Change in Wins}\right]$

Assemble a group of replacement players who get promoted from and then demoted to the minors during a season, like openWAR does
Use Stan to estimate the posterior distribution of latent abilities of all players conditional on data under a generative model
Repeat many times: Simulate 162 games for the teams' ideal 25-man rosters
For each of the $25 \times 30$ players in (3), repeat many times:
- Replace with a randomly-drawn player from (1) who plays same position
- Resimulate that team's season once
Calculate the average difference in (a function of) wins between (3) and (4)

Unlike *WAR, this is founded on probability and specific to the team's context
Unlike *WAR, replacing $A$ affects performance of player $B$, $C$, etc.
Unlike *WAR, this isn't disorted by injuries, bereavement, suspensions, etc.

Simplified Generative Model for Base Stealing

Generative Models, Bayesian Estimates use Stan

Stan is a high-level computer language for utilizing probability distributions. Overlaps with R, Python, etc. but Stan is more focused. (http://mc-stan.org)
Large community of developers that is trusted and in demand:
- Stan Forums: http://discourse.mc-stan.org/
Academics and industry analysts across subfields who are serious about modeling phenomena tend to use Stan because that is what Stan is intended to do and has the most advanced algorithms for doing so. E.g.:
- Facebook developed prophet (https://facebook.github.io/prophet/), which comes with a Stan model for modeling time-series
- Pinnacle uses Stan to set betting lines on a variety of sports
- MLB teams are placing job ads seeking Stan skills
- Deshpande and Wyner (2017) for catcher framing

Should Cain Steal Second with Two Outs vs. NYY?

Given 2017 data, we believe he should steal off Pineda but not Sabathia

Should Managers Steal Second More with 2 Out?

With caveats, managers seem a bit too conservative with two outs

Advancing the Stolen Base Question with Stan

"Rule": Only steal 2B if the probability of success is at least $0.8$ but
- Conclusion based on maximizing expected runs that inning, not utility
- Expected runs for each game state is assumed to be the same regardless of runner, pitcher catcher, batter, on deck, etc.
- Runners who are fast have a higher probability of scoring from second (or first) than the average runner
For the sake of presentation simplicity, we do not tackle any of those issues, and base-stealing opportunities are limited to runner on 1B only with 2 out (but not a full count)

How do you know what the probability of a successful steal is in this situation? Selection effects make this and similar problems difficult, unless you estimate generative models using a tool like Stan.

Key Part of the `model` Block of a Stan Program


// intermediate variables (indexing via [] works the same as in R)

alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers]
Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners])
utility = (E_runs_2B * (1 - Pr_out) - E_runs_1B) / scale
Pr_attempt = inv_logit(omega_0 + omega_1 * utility)

// conditional probability of the observables

attempts ~ binomial(opportunities, Pr_attempt)
caught ~ beta_binomial(attempts, alpha_defense, beta_runner[runners])

// prior distributions for the primitive unknowns

rho[1] ~ exponential(1)
rho[2] ~ pareto(rho[1], 2)
alpha_pitcher ~ dirichlet(rho[{2,1}][p_throws])
...

Easing into Stan and Generative Modeling

A few R packages such as rstanarm and brms map familiar R model-fitting syntax:



  y ~ x + (1 + x | g), data = dataset, family = binomial()

prefixed with rstanarm::stan_glmer or brms::brm instead of lme4::glmer. This enables new users to take advantage of Stan without having to learn Stan's language and idioms, but has limited choices for priors, functional forms, and multivariate statistics.

Recap

Heuristics are not an adequate substitute for maximizing expected utility
Expectations presupose you are working with probability distributions
Sabermetrics, unlike pokermetrics, has not been doing this
But you can do it in Stan if you specify and justify your generative model

Thanks to for inviting us to speak

Appendix: Stan Code for Stolen Base Model

`data` Block

data {
  // sizes
  int<lower=1> obs;        // number of observations with 2 outs and runner on first only
  int<lower=1> N_runners;  // number of runners
  int<lower=1> N_pitchers; // number of pitchers
  int<lower=1> N_catchers; // number of catchers
  // ID variables (like factors in R but coded as consecutive integers)
  int<lower=1,upper=N_runners> runners[obs];
  int<lower=1,upper=N_pitchers> pitchers[obs];
  int<lower=1,upper=N_catchers> catchers[obs];
  // known inputs
  vector<lower=0>[N_runners] inv_top_speed;  // reciprocal of top speed / 30 FPS
  int<lower=1,upper=2> p_throws[N_pitchers]; // indicator of pitcher handedness
  vector<lower=0>[N_catchers] time2B;        // time of ball to get to 2B
  // counts
  int<lower=0> attempts[obs];
  int<lower=1> opportunities[obs];
  int<lower=0> caught[obs];
}

`transformed data` and `parameters` Blocks

transformed data {
  // Link?
  real E_runs_2B = 0.3298; // expected runs | 2 out, runner on 2B only
  real E_runs_1B = 0.2349; // expected runs | 2 out, runner on 1B only
  real scale = fabs(E_runs_2B - 2 * E_runs_1B);
}
parameters {
  vector<lower=0>[N_runners] beta_runner;     // base-stealing ability
  simplex[N_pitchers] alpha_pitcher;          // holding runner ability and time to plate
  vector<lower=0>[N_catchers] alpha_catcher;  // caught-stealing ability

  positive_ordered[2] rho;   // sensitivity to pitcher
  real<lower=0> gamma;       // sensitivity to catcher
  real omega_0;              // intercept for strategy
  real<lower=0> omega_1;     // sensitivity to strategy
}

`model` Block

model {
  vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers];
  vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
  vector[obs] utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;
  vector[obs] Pr_attempt = inv_logit(omega_0 + omega_1 * utility);

  // likelihood
  target += binomial_lpmf(attempts | opportunities, Pr_attempt); // selection
  target += beta_binomial_lpmf(caught | attempts, alpha_defense, beta_runner[runners]);
  // priors
  target += exponential_lpdf(rho[1] | 1);
  target += pareto_lpdf(rho[2] | rho[1], 2);
  target += dirichlet_lpdf(alpha_pitcher   | rho[{2,1}][p_throws]);
  target += exponential_lpdf(gamma | 1);
  target += normal_lpdf(omega_0 | 0, 1);
  target += exponential_lpdf(omega_1 | 1);
  target += exponential_lpdf(beta_runner   | inv_top_speed);
  target += exponential_lpdf(alpha_catcher | gamma * time2B);
}

`generated quantities` Block

generated quantities {
  vector[obs] utility;
  {
    vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .*
                                alpha_catcher[catchers];
    vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
    utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;
  }
}