Let’s practice expressing a probabilistic model aided by physics. Physics have been used, of course, to mathematically describe the forces involved in running short distances. To estimate sprint speed we can inform our models with basic anatomical information and physics. The mathematical physics of running speed depend on body weight, and stride length and rate. Let’s begin with those.

Let’s start with physics equations to describe speed, and accelerating to maximum speed. We can express this by considering acceleration per stride, as explained in Herman (2016). Imagine stepping into the starting blocks on a 100m sprint. On the first push with our legs, we accelerate to a speed $v_1$ at the end of the stride from a force $F$ moved by a distance $L$. To accelerate, we must produce the required work $W$, as force $F$ times stride length $L$ — all together, $W = FL$ — further expressed by the kinetic energy in our body, $\frac{1}{2}ma$, or more specifically, $\frac{1}{2}(m_{\text{b}} - 2m_{\text{leg}})v_1^2$, with one leg in the air $\frac{1}{2}m_{\text{leg}}v_1^2$. After the first stride, then, we can express our speed or velocity as,

$v_1^2 = \frac{2FL}{m_{\text{b}}- m_{\text{leg}}}$

In the next stride, we’ve switched feet doing the work, again $FL$, but accelerate the body from $v_1$ to $v_2$ and the first leg from $0$ to $v_2$, thus,

$v_2^2 = \frac{2FL}{m_{\text{b}} - m_{\text{leg}}} \left( 1 + \frac{m_{\text{b}} - 2m_{\text{leg}}}{m_{\text{b}} - m_{\text{leg}}} \right)$

For $n$ strides,

$v_n^2 = \frac{2FL}{m_{\text{b}} - m_{\text{leg}}} \left( 1 + \frac{m_{\text{b}} - 2m_{\text{leg}}}{m_{\text{b}} - m_{\text{leg}}} + \left( \frac{m_{\text{b}} - 2m_{\text{leg}}}{m_{\text{b}} - m_{\text{leg}}} \right)^2 +\dots+\left( \frac{m_{\text{b}} - 2m_{\text{leg}}}{m_{\text{b}} - m_{\text{leg}}} \right)^n\right)$

Notice that these accelerations follow the pattern of a geometric series $1 + x + x^2 + \dots + x^n = (1 - x^n) / (1 - x)$ for $0 < x < 1$. Thus, the series approaches a final running velocity of,

$v_{n \rightarrow \infty}= \sqrt \frac{2FL}{m_{\text{leg}}}$

Empirical studies provide an expected leg mass $m_{\text{leg}}$ around $0.161m_{\text{b}}$ see Herman (2016). As with leg mass, empirical studies have estimated stride length as a function of human height, though stride length will also depend on run technique and generated forces. Those experiments measured stride lengths during short distances between around 1.14 to 1.35 times runner height in meters, see Rompottie (1972) and Hoffman (1972). To estimate a new run speed, we can estimate our force, given observations of previous sprints through modeling.

Let’s code the beginning of a simplified model that reflects these measures and assumptions using Stan, a probabilistic programming language. Take some time to compare the mathematical description above with the code below:

data {

// runner and event information

int<lower=0> N;          // count runs
array[N] int<lower=0> r; // runner index
vector<lower=0>[N] t;    // time s
vector<lower=0>[N] d;    // distance m
vector<lower=0>[N] m;    // body mass kg
vector<lower=0>[N] h;    // body height m

}

transformed data {
int n_r = max(r);                     // count unique runners
vector<lower=0>[N] L = 1.35 * h;      // stride length, m
vector<lower=0>[N] m_leg = 0.161 * m; // lifted leg weight, kg
}

parameters {
vector<lower=0>[n_r] F; // force each runner, Newtons (kg m / s^2)
real<lower=0> F_pop;    // force population, Newtons (kg m / s^2)
real<lower=0> sigma;    // overall variation, m / s
}

model {

// priors
F_pop ~ normal(250, 20);
F ~ normal(F_pop, 50);
sigma ~ exponential(5);

// likelihood
vector[N] mu = sqrt(2 * F[r] .* L ./ m_leg);
target += normal_lpdf( d ./ t | mu, sigma);
}

generated quantities {
array[N] real
t_hat = normal_rng(sqrt(2 * F[r] .* L ./ m_leg), sigma);
}


Let’s fit this simplistic model to, say, winning Olympic male sprinters. We can pull their race information from Wikipedia, along with each winner’s height and weight from their biographies with the caveat that those anthropometric values are general and do not necessarily reflect their height and weight at the time of their winning race. Thus, this model fit is merely illustrative. Here are the data:

 Year Winner Time Height Weight 1896 Tom Burke (USA) 12 1.83 66 1900 Frank Jarvis (USA) 11 1.67 58 1904 Archie Hahn (USA) 11 1.67 64 1908 Reggie Walker (SAF) 10.8 1.70 61 1912 Ralph Craig (USA) 10.8 1.82 73 1920 Charles Paddock (USA) 10.8 1.71 75 1924 Harold Abrahams (GBR) 10.6 1.83 75 1928 Percy Williams (CAN) 10.8 1.70 56 1932 Eddie Tolan (USA) 10.38 1.70 65 1936 Jesse Owens (USA) 10.3 1.80 75 1948 Harrison Dillard (USA) 10.3 1.78 69 1952 Lindy Remigino (USA) 10.4 1.68 63 1956 Bobby Morrow (USA) 10.5 1.86 75 1960 Armin Hary (GER) 10.2 1.82 71 1964 Bob Hayes (USA) 10 1.80 84 1968 Jim Hines (USA) 9.95 1.83 81 1972 Valeriy Borzov (URS) 10.14 1.83 80 1976 Hasely Crawford (TRI) 10.06 1.87 90 1980 Allan Wells (GBR) 10.25 1.83 86 1984 Carl Lewis (USA) 9.99 1.88 80 1988 Carl Lewis (USA) 9.92 1.88 80 1992 Linford Christie (GBR) 9.96 1.88 92 1996 Donovan Bailey (CAN) 9.84 1.85 91 2000 Maurice Greene (USA) 9.87 1.76 75 2004 Justin Gatlin (USA) 9.85 1.85 83 2008 Usain Bolt (JAM) 9.69 1.95 94 2012 Usain Bolt (JAM) 9.63 1.95 94 2016 Usain Bolt (JAM) 9.81 1.95 94 2021 Lamont Marcell Jacobs (ITA) 9.8 1.86 84

Let’s fit the model with these data.

Running a posterior predictive check, we find that our estimates include the data: Here are our estimates of the forces each winner produced: Now, in this simplified example, I included prior experimental values for leg mass $m_{\text{leg}}$ and stride length $L$, but these are not precise for our particular sprinters. We can propagate variation and uncertainty in these by estimating them as random variables, informed by past observations. Further, the prior information assigned was illustrative, and not informative. With more care in those priors, we can more accurately model these forces. Of note, we estimated each winner’s forces as separate parameters, informed by a hyper-prior of the average forces generated by the population.

What can we use these parameters for? Well, we might draw from a particular winner’s estimates to see what a new race may have resulted in. Or we may use the population information to estimate what a new winning sprinter may have generated. But, certainly, there are limitations in the use of this model as is. If we were to extend the model, better uses might be to try to include models for, say, maximum expected forces possible in tomorrow’s Olympic races. To get started, consider some ideas in Noubary (2010). Or we may consider including models of change in runner forces as they age.

Stay curious.