Proposal for exploring game decisions informed by

expectations of joint probability distributions

To: Scott Powers, Director of Quantitative Analysis, Los Angeles Dodgers

From: Scott Spencer, Faculty and Lecturer, Columbia University

14 February 2019

Our game decisions based on current modeling do not maximize spend per win. We wit-

nessed the mid-market Astros use analytics to overtake us in the 2017 World Series

(Luhnow 2018ab). Our efforts also do not maximize expected wins. But we can. To do

so, we need to jointly model probabilities of all game events and base decisions on expec-

tations of those distributions. With adequate computing emerging, we can be first using

the probabilistic programming language Stan and parallel processing. To demonstrate

the concept, consider a probability model for decisions to steal second base, below, which

ggests teams are too conservative, leaving wins unclaimed. This model allows us to

ask, for example—should Sanchez steal against Sabathia? Or against Pineda? Having

modeled stealing second, we next need to hire and jointly model the rest of the game.

1 Our current analyses do not optimize expected wins

Seven terabytes of uncompressed data generated per game overshadow the lack of situa-

tional data needed for decision-making that maximizes expected utility. Consider that

pitchers, on average, only face10 percent of major league batters regardless of game state;

the reverse is true, too. Or when deciding whether a base runner should attempt to steal

against a specific pitcher and catcher in a state of play, say, we are lucky to have any data.

Common analyses and heuristics for these situations are inadequate: they not only over-

fit the data (if any exist), but also offer no manner of estimating changes in probabilities

for maximizing expected utility (winning the game).

Accurately quantifying probabilities, and changes thereof, in a given context enable us to

answer counterfactuals, from which we can build strategies that maximize our objectives

(Parmigiani 2002). This approach is possible at scale using Stan (Carpenter et al. 2017).

It’s time to jointly model probabilities of all events.

2 Modeling probabilities for steal success illustrates a broader benefit

To see the potential of implementing probability models, let’s consider, again, the deci-

sion to steal bases, given a specific counterfactual:

PROPOSAL FOR EXPLORING GAME DECISIONS INFORMED BY EXPECTATIONS OF JOINT PROBABILITY DISTRIBUTIONS 2

In a game against New York Yankees, should Milwaukee Brewers’s Lorenzo

Cain attempt to steal second base with no one else on base and two outs be-

fore the seventh inning, against Gary Sanchez as catcher and Michael Pineda

as pitcher? What if against Sanchez and CC Sabathia as pitcher?

More specifically, how can we know the expectation that Cain’s attempt in each situation

increases the probability of expected runs that inning and by how much? Using Stan, I’ve

coded a generative model that along with play outcomes considers various information

(runner foot-speed, catcher pop-time) and player characteristics, like pitcher handed-

ness. With the model, we have an answer that also shows the uncertainty. Given 2017

data, this model suggests Cain should steal against Pineda, not Sabathia:

Notably, we get these expectations without multiple trials of either scenario. More gen-

erally, this model suggests that on average team managers are too conservative, leaving

runs unrealized:

The above is but one example of a more general approach that weighs probabilities of all

possible outcomes to maximize expected utility. With broad implementation—jointly

modeling the conditional probabilities of all relevant events—we can optimize decisions.

vs. Pineda / Sanchez

vs. Sabathia / Sanchez

−0.3 −0.2 −0.1 0.0 0.1 −0.3 −0.2 −0.1 0.0 0.1

Expected change in runs in each scenario

Indifferent

Opportunity?

0.00

0.25

0.50

0.75

1.00

−0.4 −0.2 0.0 0.2

Expected change in runs in an inning

Probability of steal attempt

Figure 1. Of the two scenarios, Cain

should only attempt to steal against the

Sanchez–Pineda duo.

Figure 2. When the change in expected

runs is zero, managers should be indif-

ferent to attempted steals, saying go

half the time.

The black band represents the range of

variation across managers’ decisions.

At the intersection of indifference,

managers tend to say steal only 10

percent of the time, leaving oppor-

tunity.

PROPOSAL FOR EXPLORING GAME DECISIONS INFORMED BY EXPECTATIONS OF JOINT PROBABILITY DISTRIBUTIONS 3

3 For value, compare an investment to free-agent costs

A fully-realized model will require significant effort from a team with deep experience in

baseball, generative modeling, and Stan. To get the talent, we should compare cost to ac-

quiring expected wins from free-agents. Each win above a replacement-level player costs

about 10 million per year (Swartz 2017). As with free-agent value over replacement

player, game-time decisions informed from more accurate probabilities should add wins

over a season. The scope of what we can answer, moreover, goes beyond in-game strategy

(player acquisitions, salary arbitration). More immediately, however, we can begin to im-

plement this approach for specific events, with a scope closer to the example above, being

mindful that information learnt are conditional upon unmodeled context.

4 For accuracy, compare model results to betting market odds

Measuring performance of a fully-realized model may seem tricky: we only see the out-

come of our decisions. But we can, say, compare the accuracy of our estimates against the

betting market where interested investors are trying to forecast game outcomes.

Next steps, hire and jointly model

The mid-market Astros show teams can do more with information. Millions in addi-

tional revenue—and more wins—await discovery through a joint, probability model of

all events from which we can maximize conditional expectations. Let’s discuss how to

draw the talent for a title worth our spend.

6 References

Carpenter, Bob, et. al. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical

Software 76 (1): 1–32.

Luhnow, Jeff. 2018a. “How the Houston Astros are winning through advanced analytics.”

McKinsey Quarterly 13 June 2018: 1–9.

———. 2018b. “A view from the front lines of baseball’s data-analytics revolution.” McKinsey

Quarterly 5 July 2018: 1–8.

Parmigiani, G. 2002. “Decision Theory: Bayesian.” In International Encyclopedia of the Social Be-

havioral Sciences, 3327–34.

Swartz, Matt. 2017. “The Recent History of Free-Agent Pricing.” https://www.fan-

graphs.com/blogs/the-recent-history-of-free-agent-pricing/.