Coding custom distribution functions in Stan

The probabilistic programming language called Stan is a powerful tools for modeling. It includes numerous distribution functions we can use to model. While it cannot include every conceivable distribution in its available functions, it does enable the next best thing: we can code our own!

The Stan library does not, for example, currently include the three-parameter Weibull distribution. In this tutorial, I’ll code its PDF and CDF to demonstrate coding custom distributions in Stan.

The probability density function of the three-parameter Weibull includes a shape parameter \(\eta\) > 0, a scale parameter \(\lambda\) > 0, and a location parameter \(\theta\) in the following form:

\[ f(x; \eta, \lambda, \theta) = \eta \lambda (x - \theta) ^{\lambda - 1} \cdot \exp(-\eta(x - \theta) ^ \lambda) \]

And the cumulative distribution function fits the following form:

\[ F(x; \eta, \lambda, \theta) = 1 - \exp(-\eta(x - \theta)^\lambda) \]

We can code these in Stan directly, but our most common use-case will be to use the log PDF or log CDF. The log of the above PDF is,

\[ f_{\log}(x; \eta, \lambda, \theta) = \log(\eta) + \log(\lambda) + (\lambda - 1) \log(x - \theta) - \eta (x - \theta) ^ \lambda \]

And the log of the above CDF is,

\[ F_{\log}(x; \eta, \lambda, \theta) = \eta (x - \theta) ^ \lambda \]

The Stan code will read very similarly to the math. We place this code into the beginning of a Stan program in a “block” called functions {}, shown below with the regular and log form of the PDF and CDF:

functions {

  real weibull_pdf(data real x, real eta, real lambda, real theta) {
    return eta * lambda * (x - theta) ^ (lambda - 1) * exp(-eta * (x - theta) ^ lambda);
  }
  
  real weibull_cdf(data real x, real eta, real lambda, real theta) {
    return 1 - exp(-eta * (x - theta) ^ lambda);
  }
  
  real weibull_lpdf(data real x, real eta, real lambda, real theta) {
    return log(eta) + log(lambda) - eta * (x - theta) ^ lambda + (lambda - 1) * log(x - theta);
  } 

  real weibull_lcdf(data real x, real eta, real lambda, real theta) {
    return 1 - exp(-eta * (x - theta) ^ lambda);
  }
  
}

That’s the bare minimum to get these working, though I also recommend adding code within the functions to test that x > \(\theta\), \(\eta\) > 0, and \(\lambda\) > 0. We can accomplish that by include an if statement where, if true, we stop the sampling with a reject function.

We may, of course, also vectorize these statements for convenience and efficiency¹. Once we have them coded, we can use them just like any other distribution statement in Stan. Happy coding.

Stay curious!

If coding them this way, be sure to sum the vector before returning for use when adding it to the log likelihood.↩︎