Saberseminar, Boston
August 5, 2018

## Obligatory Disclosure

• Jonah and Ben are employees of Columbia University, which has received several research grants to develop Stan
• Jonah and Ben are also managers of GG Statistics LLC, which uses Stan for business purposes
• According to Columbia University policy, any such employee who has any equity stake in, a title (such as officer or director) with, or is expected to earn at least $$\5,000$$ per year from a private company is required to disclose these facts in presentations

## Persistent Fallacies

• Sabermetricians often point out fallacies like:
• Announcer's Fallacy: Getting picked off before that homer really cost us a run
• Announcer's Fallacy II: He's stolen 27 out of 30 bases so he should steal now
• Unearned Run Fallacy: Official scorer knows how the inning would have played out if the error was not committed
• Problem is the lack of an explicit and convincing model to query for what would take place if the pickoff, error, etc., had not occurred
• But current approaches to sabermetrics do not offer this
• The missing principle is adhering to probability theory to avoid similar errors
• Don't be results oriented, be expectation oriented

## Why Is Baseball Ahead in Sports Analytics?

• Game has discrete states
• Availability of public data
• Early data analysis led to enough acronyms to fill up a t-shirt

## Removing the Noise Distorts the Signal

• Newer acronyms (e.g. FIP) subset to filter noise out of the process
• Yields more predictable values that are (mis)used for decision-making
• Keith Law (2017, p.152) claims "[FIP] may indeed throw out the baby with the bathwater"
• To justify decisions must account for all the signal and average over the noise

## Decision Theory Matters to Baseball

• Baseball is a game of decisions at different levels (players, managers, GMs, …)
• Rosters, batting orders, substitutions, defensive positioning, pitch sequencing, sending runners from third, etc.
• Decision theory provides a foundation for rational decision-making that has been successfully applied to many fields
• Teams should make decisions maximizing expected utility, what you care about
• Analysts should assume players, managers, and GMs are at least attempting to maximize expected utility when criticizing/analyzing their decisions

## Rational Decision-making Needs Expectations

• The most likely sequence of events for a MLB defense is a perfect game, which has only happened 23 times
• Making decisions based on what is the most likely to occur is irrational
• An expectation (denoted by $$\mathbb{E}$$ operator) of a function of a random variable has a precise definition that loosely means:
• weight everything that could happen by its probability and accumulate
• To use Pythagoren Theorem — we don't — as an expectation requires assuming runs scored/allowed are independent Weibull random variables
• Few sabermetrics examples do what is rational: explicitly weighting a utility function with a probability distribution
• Stan yields probability-weighted outputs necessary for decision-making

• Poker community agreed that utility = prize money and made 100% of the analysis be about the expected value of a bet / call / fold
• Pokermetrics use simulations of opponents to compute expectations and seek optimal strategies
• Nothing prevents sabermetricians from doing the same with Stan

## Stealing Bases: A Small Step in the Right Direction

• Early game: maximizing expected run difference $$\approx$$ maximizing win probability
• $$0.8 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{ Steal 2B}] + 0.2 \cdot \mathbb{E}[\textrm{Runs} \mid \textrm{Caught}] \approx \mathbb{E}[\textrm{Runs} \mid \textrm{Stay at 1B}]$$
• Conclusion: Only steal if the probability of success is greater than $$0.8$$
• This sabermetric conclusion has actually led to fewer steal attempts
• Need more decision-theoretic analysis like this in sabermetrics!
• But there are several limitations that can be addressed by using Stan

## How We Would Calculate $$\mathbb{E}\left[\textrm{Change in Wins}\right]$$

1. Assemble a group of replacement players who get promoted from and then demoted to the minors during a season, like openWAR does
2. Use Stan to estimate the posterior distribution of latent abilities of all players conditional on data under a generative model
3. Repeat many times: Simulate 162 games for the teams' ideal 25-man rosters
4. For each of the $$25 \times 30$$ players in (3), repeat many times:
• Replace with a randomly-drawn player from (1) who plays same position
• Resimulate that team's season once
5. Calculate the average difference in (a function of) wins between (3) and (4)
• Unlike *WAR, this is founded on probability and specific to the team's context
• Unlike *WAR, replacing $$A$$ affects performance of player $$B$$, $$C$$, etc.
• Unlike *WAR, this isn't disorted by injuries, bereavement, suspensions, etc.

## Generative Models, Bayesian Estimates use Stan

• Stan is a high-level computer language for utilizing probability distributions. Overlaps with R, Python, etc. but Stan is more focused. (http://mc-stan.org)
• Large community of developers that is trusted and in demand:
• Academics and industry analysts across subfields who are serious about modeling phenomena tend to use Stan because that is what Stan is intended to do and has the most advanced algorithms for doing so. E.g.:

## Should Cain Steal Second with Two Outs vs. NYY?

Given 2017 data, we believe he should steal off Pineda but not Sabathia

## Should Managers Steal Second More with 2 Out?

With caveats, managers seem a bit too conservative with two outs

## Advancing the Stolen Base Question with Stan

• "Rule": Only steal 2B if the probability of success is at least $$0.8$$ but

• Conclusion based on maximizing expected runs that inning, not utility
• Expected runs for each game state is assumed to be the same regardless of runner, pitcher catcher, batter, on deck, etc.
• Runners who are fast have a higher probability of scoring from second (or first) than the average runner
• For the sake of presentation simplicity, we do not tackle any of those issues, and base-stealing opportunities are limited to runner on 1B only with 2 out (but not a full count)

• How do you know what the probability of a successful steal is in this situation? Selection effects make this and similar problems difficult, unless you estimate generative models using a tool like Stan.

## Key Part of the model Block of a Stan Program


// intermediate variables (indexing via [] works the same as in R)

alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers]
Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners])
utility = (E_runs_2B * (1 - Pr_out) - E_runs_1B) / scale
Pr_attempt = inv_logit(omega_0 + omega_1 * utility)

// conditional probability of the observables

attempts ~ binomial(opportunities, Pr_attempt)
caught ~ beta_binomial(attempts, alpha_defense, beta_runner[runners])

// prior distributions for the primitive unknowns

rho[1] ~ exponential(1)
rho[2] ~ pareto(rho[1], 2)
alpha_pitcher ~ dirichlet(rho[{2,1}][p_throws])
...



## Easing into Stan and Generative Modeling

A few R packages such as rstanarm and brms map familiar R model-fitting syntax:



y ~ x + (1 + x | g), data = dataset, family = binomial()


prefixed with rstanarm::stan_glmer or brms::brm instead of lme4::glmer. This enables new users to take advantage of Stan without having to learn Stan's language and idioms, but has limited choices for priors, functional forms, and multivariate statistics.

## Recap

1. Heuristics are not an adequate substitute for maximizing expected utility
2. Expectations presupose you are working with probability distributions
3. Sabermetrics, unlike pokermetrics, has not been doing this
4. But you can do it in Stan if you specify and justify your generative model
• Thanks to for inviting us to speak

## data Block

data {
// sizes
int<lower=1> obs;        // number of observations with 2 outs and runner on first only
int<lower=1> N_runners;  // number of runners
int<lower=1> N_pitchers; // number of pitchers
int<lower=1> N_catchers; // number of catchers
// ID variables (like factors in R but coded as consecutive integers)
int<lower=1,upper=N_runners> runners[obs];
int<lower=1,upper=N_pitchers> pitchers[obs];
int<lower=1,upper=N_catchers> catchers[obs];
// known inputs
vector<lower=0>[N_runners] inv_top_speed;  // reciprocal of top speed / 30 FPS
int<lower=1,upper=2> p_throws[N_pitchers]; // indicator of pitcher handedness
vector<lower=0>[N_catchers] time2B;        // time of ball to get to 2B
// counts
int<lower=0> attempts[obs];
int<lower=1> opportunities[obs];
int<lower=0> caught[obs];
}

## transformed data and parameters Blocks

transformed data {
real E_runs_2B = 0.3298; // expected runs | 2 out, runner on 2B only
real E_runs_1B = 0.2349; // expected runs | 2 out, runner on 1B only
real scale = fabs(E_runs_2B - 2 * E_runs_1B);
}
parameters {
vector<lower=0>[N_runners] beta_runner;     // base-stealing ability
simplex[N_pitchers] alpha_pitcher;          // holding runner ability and time to plate
vector<lower=0>[N_catchers] alpha_catcher;  // caught-stealing ability

positive_ordered[2] rho;   // sensitivity to pitcher
real<lower=0> gamma;       // sensitivity to catcher
real omega_0;              // intercept for strategy
real<lower=0> omega_1;     // sensitivity to strategy
}

## model Block

model {
vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .* alpha_catcher[catchers];
vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
vector[obs] utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;
vector[obs] Pr_attempt = inv_logit(omega_0 + omega_1 * utility);

// likelihood
target += binomial_lpmf(attempts | opportunities, Pr_attempt); // selection
target += beta_binomial_lpmf(caught | attempts, alpha_defense, beta_runner[runners]);
// priors
target += exponential_lpdf(rho[1] | 1);
target += pareto_lpdf(rho[2] | rho[1], 2);
target += dirichlet_lpdf(alpha_pitcher   | rho[{2,1}][p_throws]);
target += exponential_lpdf(gamma | 1);
target += normal_lpdf(omega_0 | 0, 1);
target += exponential_lpdf(omega_1 | 1);
target += exponential_lpdf(beta_runner   | inv_top_speed);
target += exponential_lpdf(alpha_catcher | gamma * time2B);
}

## generated quantities Block

generated quantities {
vector[obs] utility;
{
vector[obs] alpha_defense = N_pitchers * alpha_pitcher[pitchers] .*
alpha_catcher[catchers];
vector[obs] Pr_out = alpha_defense ./ (alpha_defense + beta_runner[runners]);
utility = (E_runs_2B * (1 - Pr_out) /* + 0 * Pr_out */ - E_runs_1B) / scale;
}
}