Proposal for exploring game decisions informed by
expectations of joint probability distributions
To: Scott Powers, Director of Quantitative Analysis, Los Angeles Dodgers
From: Scott Spencer, Faculty and Lecturer, Columbia University
14 February 2019
Our game decisions based on current modeling do not maximize spend per win. We wit-
nessed the mid-market Astros use analytics to overtake us in the 2017 World Series
(Luhnow 2018ab). Our efforts also do not maximize expected wins. But we can. To do
so, we need to jointly model probabilities of all game events and base decisions on expec-
tations of those distributions. With adequate computing emerging, we can be first using
the probabilistic programming language Stan and parallel processing. To demonstrate
the concept, consider a probability model for decisions to steal second base, below, which
su
ggests teams are too conservative, leaving wins unclaimed. This model allows us to
ask, for example—should Sanchez steal against Sabathia? Or against Pineda? Having
modeled stealing second, we next need to hire and jointly model the rest of the game.
1 Our current analyses do not optimize expected wins
Seven terabytes of uncompressed data generated per game overshadow the lack of situa-
tional data needed for decision-making that maximizes expected utility. Consider that
pitchers, on average, only face10 percent of major league batters regardless of game state;
the reverse is true, too. Or when deciding whether a base runner should attempt to steal
against a specific pitcher and catcher in a state of play, say, we are lucky to have any data.
Common analyses and heuristics for these situations are inadequate: they not only over-
fit the data (if any exist), but also offer no manner of estimating changes in probabilities
for maximizing expected utility (winning the game).
Accurately quantifying probabilities, and changes thereof, in a given context enable us to
answer counterfactuals, from which we can build strategies that maximize our objectives
(Parmigiani 2002). This approach is possible at scale using Stan (Carpenter et al. 2017).
It’s time to jointly model probabilities of all events.
2 Modeling probabilities for steal success illustrates a broader benefit
To see the potential of implementing probability models, let’s consider, again, the deci-
sion to steal bases, given a specific counterfactual:
PROPOSAL FOR EXPLORING GAME DECISIONS INFORMED BY EXPECTATIONS OF JOINT PROBABILITY DISTRIBUTIONS 2
In a game against New York Yankees, should Milwaukee Brewers’s Lorenzo
Cain attempt to steal second base with no one else on base and two outs be-
fore the seventh inning, against Gary Sanchez as catcher and Michael Pineda
as pitcher? What if against Sanchez and CC Sabathia as pitcher?
More specifically, how can we know the expectation that Cain’s attempt in each situation
increases the probability of expected runs that inning and by how much? Using Stan, I’ve
coded a generative model that along with play outcomes considers various information
(runner foot-speed, catcher pop-time) and player characteristics, like pitcher handed-
ness. With the model, we have an answer that also shows the uncertainty. Given 2017
data, this model suggests Cain should steal against Pineda, not Sabathia:
Notably, we get these expectations without multiple trials of either scenario. More gen-
erally, this model suggests that on average team managers are too conservative, leaving
runs unrealized:
The above is but one example of a more general approach that weighs probabilities of all
possible outcomes to maximize expected utility. With broad implementation—jointly
modeling the conditional probabilities of all relevant events—we can optimize decisions.
vs. Pineda / Sanchez
vs. Sabathia / Sanchez
0.3 0.2 0.1 0.0 0.1 0.3 0.2 0.1 0.0 0.1
Expected change in runs in each scenario
Indifferent
Opportunity?
0.00
0.25
0.50
0.75
1.00
0.4 0.2 0.0 0.2
Expected change in runs in an inning
Probability of steal attempt
Figure 1. Of the two scenarios, Cain
should only attempt to steal against the
Sanchez–Pineda duo.
Figure 2. When the change in expected
runs is zero, managers should be indif-
ferent to attempted steals, saying go
half the time.
The black band represents the range of
variation across managers’ decisions.
At the intersection of indifference,
managers tend to say steal only 10
percent of the time, leaving oppor-
tunity.
PROPOSAL FOR EXPLORING GAME DECISIONS INFORMED BY EXPECTATIONS OF JOINT PROBABILITY DISTRIBUTIONS 3
3 For value, compare an investment to free-agent costs
A fully-realized model will require significant effort from a team with deep experience in
baseball, generative modeling, and Stan. To get the talent, we should compare cost to ac-
quiring expected wins from free-agents. Each win above a replacement-level player costs
about 10 million per year (Swartz 2017). As with free-agent value over replacement
player, game-time decisions informed from more accurate probabilities should add wins
over a season. The scope of what we can answer, moreover, goes beyond in-game strategy
(player acquisitions, salary arbitration). More immediately, however, we can begin to im-
plement this approach for specific events, with a scope closer to the example above, being
mindful that information learnt are conditional upon unmodeled context.
4 For accuracy, compare model results to betting market odds
Measuring performance of a fully-realized model may seem tricky: we only see the out-
come of our decisions. But we can, say, compare the accuracy of our estimates against the
betting market where interested investors are trying to forecast game outcomes.
5
Next steps, hire and jointly model
The mid-market Astros show teams can do more with information. Millions in addi-
tional revenue—and more wins—await discovery through a joint, probability model of
all events from which we can maximize conditional expectations. Let’s discuss how to
draw the talent for a title worth our spend.
6 References
Carpenter, Bob, et. al. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical
Software 76 (1): 1–32.
Luhnow, Jeff. 2018a. “How the Houston Astros are winning through advanced analytics.”
McKinsey Quarterly 13 June 2018: 1–9.
———. 2018b. “A view from the front lines of baseball’s data-analytics revolution.” McKinsey
Quarterly 5 July 2018: 1–8.
Parmigiani, G. 2002. “Decision Theory: Bayesian.” In International Encyclopedia of the Social Be-
havioral Sciences, 3327–34.
Swartz, Matt. 2017. “The Recent History of Free-Agent Pricing.https://www.fan-
graphs.com/blogs/the-recent-history-of-free-agent-pricing/.