Bayesian Autoregressions

Abstract. We present the basics of Bayesian estimation and inference for autoregressive models. The range of topics includes the natural conjugate analysis using normal-inverted-gamma 2 prior distribution and its extensions focusing on hierarchical modelling, conditional heteroskedasticity, and Student-t error terms. We focus on forecasting and sampling from the predictive density.

Keywords. Autoregressions, Bayesian Inference, Forecasting, Heteroskedasticity, Hierarchical Modelling, Natural Conjugacy, Shrinkage Prior

Autoregressions

Autoregressions are a popular class of linear models that are the most useful for time series persistence analysis and forecasting a random variable’s unknown future values. The simplicity of their formulation, estimation, and range of applications in which they occur useful decides on their continued employment.

The AR(\(p\)) model

The model is set for a univariate time series whose observation at time \(t\) is denoted by \(y_t\). It includes a \(d\)-vector \(d_t\) of deterministic terms and \(p\) lags of the dependent variable on the right-hand side of the model equation. It is complemented by error term \(u_t\) that, in this note, is zero-mean normally distributed with variance \(\sigma^2\). Then the model equations are: \[\begin{align} y_t &= \alpha_d' d_t + \alpha_1 y_{t-1} + \dots + \alpha_p y_{t-p} + u_t\\ u_t\mid d_t, y_{t-1}, \dots, y_{t-p} &\sim\mathcal{N}\left(0, \sigma^2\right) \end{align}\] where \(\alpha_d\) is a \(d\)-vector of coefficients on deterministic terms, and parameters \(\alpha_1,\dots,\alpha_p\) are autoregressive slopes.

Matrix notation for the model

To simplify the notation and the derivations introduce matrix notation for the model. Let \(T\) be the available sample size for the variable \(y\). Define a \(T\)-vector of zeros, \(\mathbf{0}_T\), the identity matrix of order \(T\), \(\mathbf{I}_T\), \(T\times1\) vectors: \[\begin{align} \mathbf{y} = \begin{bmatrix} y_1\\ \vdots \\ y_T\end{bmatrix}, \quad \text{ and }\quad \mathbf{u} = \begin{bmatrix} u_1\\ \vdots \\ u_T\end{bmatrix}, \end{align}\] a \(k\times1\) vector \(\mathbf{x}_t = \begin{bmatrix}d_t' & y_{t-1}&\dots& y_{t-} \end{bmatrix}'\), where \(k=d+p\), and a \(T\times k\) matrix collecting the explanatory variables: \[\begin{align} \mathbf{X} = \begin{bmatrix} \mathbf{x}_1'\\ \vdots \\ \mathbf{x}_T'\end{bmatrix}. \end{align}\] Collect the parameters of the conditional mean equation in a \(k\)-vector: \[\begin{align} \boldsymbol\alpha = \begin{bmatrix} \alpha_d'& \alpha_1 & \dots & \alpha_p\end{bmatrix}'. \end{align}\]

Then the model can be written in a concise notation as: \[\begin{align} \mathbf{y} &= \mathbf{X} \boldsymbol\alpha + \mathbf{u}\\ \mathbf{u}\mid \mathbf{X} &\sim\mathcal{N}_T\left(\mathbf{0}_T, \sigma^2\mathbf{I}_T\right). \end{align}\]

Likelihood function

The model equations imply the predictive density of the data vector \(\mathbf{y}\). To see this, consider the model equation as a linear transformation of a normal vector \(\mathbf{u}\). Therefore, the data vector follows a multivariate normal distribution given by: \[\begin{align} \mathbf{y}\mid \mathbf{X}, \boldsymbol\alpha, \sigma^2 &\sim\mathcal{N}_T\left(\mathbf{X} \boldsymbol\alpha, \sigma^2\mathbf{I}_T\right). \end{align}\]

This distribution determines the shape of the likelihood function that is defined as the sampling data density: \[\begin{align} L(\boldsymbol\alpha,\sigma^2|\mathbf{y}, \mathbf{X})\equiv p\left(\mathbf{y}\mid \mathbf{X}, \boldsymbol\alpha, \sigma^2 \right). \end{align}\]

The likelihood function that for the sake of the estimation of the parameters, and after plugging in data in place of matrices \(\mathbf{y}\) and \(\mathbf{X}\), is considered a function of parameters \(\boldsymbol\alpha\) and \(\sigma^2\) is given by: \[\begin{align} L(\boldsymbol\alpha,\sigma^2|\mathbf{y}, \mathbf{X}) = (2\pi)^{-\frac{T}{2}}\left(\sigma^2\right)^{-\frac{T}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol\alpha)'(\mathbf{y} - \mathbf{X}\boldsymbol\alpha)\right\}. \end{align}\]

Natural-Conjugate Analysis

Authors: Thomas Kronholm Møller & Jonas Loopers Davidsen

Likelihood as normal-inverted gamma 2

In order to facilitate deriving the posterior distribution, the likelihood function can be rewritten to a \(\mathcal{NIG}2\)-distribution given by:

\[\begin{align} L(\boldsymbol\alpha,\sigma^2|\mathbf{y}, \mathbf{X}) &\propto \left(\sigma^2\right)^{-\frac{T-(k+2)+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}\left(\boldsymbol\alpha-\boldsymbol{\hat{\alpha}}\right)'\mathbf{X}'\mathbf{X}\left(\boldsymbol\alpha-\boldsymbol{\hat{\alpha}}\right) \right\}\\ &\qquad\times\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right)'\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right) \right\}, \end{align}\]

where \(\boldsymbol{\hat{\alpha}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\) in the maximum likelihood estimator of \(\boldsymbol\alpha\). It is now quite straight forward to identify the \(\mathcal{NIG}2\)-kernel. By remembering that the \(\mathcal{NIG}2\)-distribution is characterized by its four moments, we get the following outcome:

\[\begin{align} L(\boldsymbol\alpha,\sigma^2|\mathbf{y}, \mathbf{X}) = \mathcal{NIG}2\left(\mu=\boldsymbol{\hat{\alpha}},\Sigma=\left(\mathbf{X}'\mathbf{X}\right)^{-1},s=\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right)'\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right),\nu=T-k-2\right) \end{align}\]

Normal-inverted gamma 2 prior

The prior distribution of the natural conjugate is determined by the form of the distribution of the parameters implied by the likelihood function discussed above. The priors for the Normal-inverse gamma 2 distribution can thus be written as:

\[\begin{align} p(\boldsymbol\alpha,\sigma^2) &= p(\boldsymbol\alpha|\sigma^2)p(\sigma^2)\\ p(\boldsymbol\alpha|\sigma^2) &= \mathcal{N}(\underline{\boldsymbol\alpha},\sigma^2\underline{\mathbf{V}}_{\boldsymbol\alpha})\\ p(\sigma^2) & = \mathcal{IG}2(\underline{s},\underline{\nu}) \end{align}\]

Using the distributions above, we can write the kernel of the \(\mathcal{NIG}2\) prior as:

\[\begin{align} \mathcal{NIG}2_{k}(\underline{\boldsymbol\alpha},\underline{\mathbf{V}}_{\boldsymbol\alpha}, \underline{s}, \underline{\nu}) \propto (\sigma^2)^{-\frac{\underline{\nu}+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}(\boldsymbol\alpha-\underline{\boldsymbol\alpha})'\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}(\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\}\exp\left\{-\frac{1}{2}\frac{\underline{s}}{\sigma^2}\right\} \end{align}\]

Normal-inverted gamma 2 posterior

The product of the prior distribution and the likelihood function as introduced above gives the posterior distribution given by:

\[\begin{align} p(\boldsymbol\alpha,\sigma^2|\mathbf{y},\mathbf{X}) \propto L(\mathbf{y}|\mathbf{X}, \boldsymbol\alpha, \sigma^2)p(\boldsymbol\alpha,\sigma^2) = L(\mathbf{y}|\mathbf{X}, \boldsymbol\alpha, \sigma^2)p( \boldsymbol\alpha| \sigma^2)p(\sigma^2) \end{align}\]

Now plugging in the expressions for the likelihood and the prior distribution:

\[\begin{align} p(\boldsymbol\alpha,\sigma^2|\mathbf{y},\mathbf{X}) &\propto \left(\sigma^2\right)^{-\frac{T-(k-2)+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}\left(\boldsymbol\alpha-\boldsymbol{\hat{\alpha}}\right)'\mathbf{X}'\mathbf{X}\left(\boldsymbol\alpha-\boldsymbol{\hat{\alpha}}\right) \right\}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right)'\left(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\alpha}}\right) \right\} \\ &\times (\sigma^2)^{-\frac{\underline{\nu}+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}(\boldsymbol\alpha-\underline{\boldsymbol\alpha})'\underline{\mathbf{V}}_{\boldsymbol\alpha}^{-1}(\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\}\exp\left\{-\frac{1}{2}\frac{\underline{s}}{\sigma^2}\right\} \end{align}\]

By collecting all terms and simplifying the expressions, the joint posterior distribution can be derived to be:

\[\begin{align} p(\boldsymbol\alpha,\sigma^2|\mathbf{y},\mathbf{X}) &\propto (\sigma^2)^{-\frac{T+\underline{\nu}+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2} (\boldsymbol\alpha-\overline{\boldsymbol\alpha})'\overline{\mathbf{V}}_{\boldsymbol\alpha}^{-1}(\boldsymbol\alpha-\overline{\boldsymbol\alpha})\right\} \times \exp\left\{-\frac{1}{2}\frac{1}{\sigma^2}\left(\underline{s}-\overline{\boldsymbol\alpha}'\overline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\overline{\boldsymbol\alpha}+\underline{\boldsymbol\alpha}'\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\underline{\boldsymbol\alpha}+\mathbf{y}'\mathbf{y}\right)\right\} \\&= (\sigma^2)^{\frac{\overline{\nu}+k+2}{2}}\exp\left\{-\frac{1}{2}\frac{1}{\sigma^2} (\boldsymbol\alpha-\overline{\boldsymbol\alpha})'\overline{\mathbf{V}}_{\boldsymbol\alpha}^{-1}(\boldsymbol\alpha-\overline{\boldsymbol\alpha})\right\} \times \exp\left\{-\frac{1}{2}\frac{\overline{s}}{\sigma^2}\right\} \end{align}\]

This now fully defines the joint posterior distribution as is it is in a normal inverse gamma 2 form with its corresponding four moments:

\[\begin{align} p(\boldsymbol\alpha,\sigma^2|\mathbf{y},\mathbf{X}) &= \mathcal{NIG}2_{k}\left(\overline{\boldsymbol\alpha},\overline{\mathbf{V}}_{\boldsymbol\alpha},\overline{s},\overline{\nu}\right)\\ \overline{\boldsymbol\alpha} &= \overline{\mathbf{V}}_{\boldsymbol\alpha}( \underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\underline{\boldsymbol\alpha}+\mathbf{X}'\mathbf{y})\\ \overline{\mathbf{V}}_{\boldsymbol\alpha} &=\left(\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}+\mathbf{X}'\mathbf{X}\right)^{-1} \\ \overline{s} &= \underline{s}-\overline{\boldsymbol\alpha}'\overline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\overline{\boldsymbol\alpha}+\underline{\boldsymbol\alpha}'\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\underline{\boldsymbol\alpha}+\mathbf{y}'\mathbf{y}\\ \overline{\nu} &= \underline{\nu}+T \end{align}\]

Sampling draws from the posterior

We start by generating a random walk, which can be used to validate that our estimation is indeed correct.

T       = 100
N       = 1

y       = apply(matrix(rnorm(T * N), ncol = N), 2, cumsum)
y       = as.matrix(y)
p       = 1
d       = 1

k    = p + d 

T     = nrow(y)
Y     = as.matrix(y[(p+1):T,])
X     = cbind(rep(1, T - p), y[1:(T - p),])

Now, defining our priors for \(\underline{\boldsymbol\alpha}\), \(\underline{\mathbf{V}}_{\boldsymbol\alpha}\), \(\underline{s}\) and \(\underline{\nu}\):

priors = list(
  alpha   = as.matrix(rep(0, k)),
  Sigma   = diag(2),
  S       = 1,
  nu      = 3
)

and computing the function for the posterior parameters:

posterior = function(y, priors){
  Sigma.inv = t(X) %*% X + priors$Sigma
  Sigma     = solve(Sigma.inv)
  alpha     = Sigma %*% (t(X) %*% Y + solve(priors$Sigma) %*% priors$alpha)
  S         = as.numeric(t(Y) %*% Y + priors$S + t(priors$alpha) %*% solve(priors$Sigma) %*% priors$alpha 
                     - t(alpha) %*% Sigma.inv %*% alpha)
  nu        = T + priors$nu 
  
  return(list(
    Sigma  = Sigma,
    alpha  = alpha,
    S      = S,
    nu     = nu
  ))
}

post = posterior(y = y, priors = priors)

We are then able to do the estimation of our parameters using the Gibbs sampler provided below.

posterior.draws = function(S, posterior){
  Sigma2.posterior      = as.matrix(posterior$S / rchisq(S, posterior$nu))
  alpha.posterior       = simplify2array(
    lapply(1:S, function(i){
      mvtnorm::rmvnorm(1, mean = posterior$alpha, sigma = Sigma2.posterior[i,] * posterior$Sigma)
    })
  )
  output = cbind(t(alpha.posterior[1,,]), Sigma2.posterior)
  return(output)
}

draws = posterior.draws(S = 1000, posterior = post)

Hierarchical Prior Analysis

Estimating autoregressive prior shrinkage

Inverted gamma 2 scale mixture of normal

Given the scalar scale follows an inverted gamma 2 distribution: \[\begin{align} \kappa_{\boldsymbol\alpha} \sim \mathcal{IG}2 (\underline{s}_{\boldsymbol\alpha},\underline{\nu}_{\boldsymbol\alpha}) \end{align}\]

We have priors as: \[\begin{align} \boldsymbol\alpha | \sigma^2, \kappa_{\boldsymbol\alpha} &\sim \mathcal{N}_{k}(\underline{\boldsymbol\alpha}, \sigma^2 \kappa_{\boldsymbol\alpha} \underline{\mathbf{V}} _{\boldsymbol\alpha}) \\ \\ \sigma^2 &\sim \mathcal{IG}2(\underline{s},\underline{\nu}) \\ \\ p(\boldsymbol\alpha, \sigma^2, \kappa_{\boldsymbol\alpha}) &= p(\boldsymbol\alpha | \sigma^2, \kappa_{\boldsymbol\alpha}) p(\sigma^2) p(\kappa_{\boldsymbol\alpha}) \end{align}\]

Then, the joint distribution of \((\boldsymbol\alpha,\sigma^2)\) given \(\kappa_{\boldsymbol\alpha}\) would follow a normal-inverted gamma 2 distribution: \[\begin{align*} p\left(\boldsymbol\alpha,\sigma^2 \mid \kappa_{\boldsymbol\alpha}\right) = \mathcal{NIG}2_N(\underline{\boldsymbol\alpha},\kappa_{\boldsymbol\alpha} \underline{\mathbf{V}} _{\boldsymbol\alpha}, \underline{s},\underline{\nu}) \end{align*}\]

A Gibbs Sampler can be applied using the following steps:

Initialize \(\kappa_{\boldsymbol\alpha}\) at \(\kappa_{\boldsymbol\alpha}^{(0)}\).

At each iteration s:

    Step 1. Draw \((\boldsymbol\alpha,\sigma^2)^{(s)} \sim p(\boldsymbol\alpha,\sigma^2|\mathbf{X},\mathbf{y},\kappa_{\boldsymbol\alpha}^{(s-1)}) = \mathcal{NIG}2(\overline{\boldsymbol\alpha},\overline{\mathbf{V}},\overline{s},\overline{\nu})\).

    Step 2. Draw \(\kappa_{\boldsymbol\alpha}^{(s)} \sim p(\kappa_{\boldsymbol\alpha}|\mathbf{X},\mathbf{y},\boldsymbol\alpha,\sigma^2) = \mathcal{IG}2(\overline{s},\overline{\nu})\).

Repeat step 1 and 2 \((S_1+S_2)\) times and discard the first \(S_1\) draws.

The full conditional posterior of \(\kappa_{\boldsymbol\alpha}\) is: \[\begin{gather*} p(\kappa_{\boldsymbol\alpha}|\mathbf{X},\mathbf{y},\boldsymbol\alpha,\sigma^2) = L(\mathbf{y}|\mathbf{X},\boldsymbol\alpha,\sigma^2) p(\kappa_{\boldsymbol\alpha}) p(\boldsymbol\alpha|\sigma^2,\kappa_{\boldsymbol\alpha}) p(\sigma^2) \\ \\ \propto p(\kappa_{\boldsymbol\alpha}) p(\boldsymbol\alpha|\sigma^2,\kappa_{\boldsymbol\alpha}) \\ \\ \propto (\kappa_{\boldsymbol\alpha})^{-\frac{\nu+2}{2}} exp \left\{ -\frac{1}{2}\frac{s}{\kappa_{\boldsymbol\alpha}} \right\} \times \text{det}(\sigma^2 \kappa_{\boldsymbol\alpha} \underline{\mathbf{V}} _{\boldsymbol\alpha})^{-\frac{1}{2}} exp \left\{ -\frac{1}{2} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'}(\sigma^2 \kappa_{\boldsymbol\alpha} \underline{\mathbf{V}} _{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\} \\ \\ = (\kappa_{\boldsymbol\alpha})^{-\frac{\nu+2+k}{2}} exp \left\{ -\frac{1}{2}\frac{s}{\kappa_{\boldsymbol\alpha}} \right\} exp \left\{ -\frac{1}{2}\frac{1}{\kappa_{\boldsymbol\alpha}} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'} (\sigma^2 \underline{\mathbf{V}}_{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\} \end{gather*}\]

in which we recognize a kernel of inverted gamma 2 distribution with parameters:

\[\begin{align} \overline{s}_{\boldsymbol\alpha} &= s + (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'} \sigma^{-2} \underline{\mathbf{V}}_{\boldsymbol\alpha}^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha}) \\ \overline{\nu}_{\boldsymbol\alpha} &= \nu + k \end{align}\]

To sample from the \(\mathcal{IG}2(s,\nu)\) distribution, the following code can be used:

# Set parameters of IG2 distribution 
s <- 1 
nu <- 1  
# Draw one sample from IG2 distribution 
sample <- s / rchisq(1, df = nu)

Gamma scale mixture of normal

Contributor: Yobin Timilsena

Alternatively, the scalar scale \(\kappa_{\boldsymbol\alpha}\) that premultiplies the covariance matrix of the normal prior for vector \(\boldsymbol\alpha\) can be assumed to have a hierarchical prior with a gamma distribution: \[ \kappa_{\boldsymbol\alpha}|\underline s_\kappa, \underline a_\kappa \sim \mathcal{G}(\underline s_\kappa, \underline a_\kappa) \] The following code creates a list kappa.priors to store priors on \(\kappa_\boldsymbol\alpha\).

kappa.priors = list(
  a = 1,
  s = .1
)

Given the likelihood and priors above, we can obtain the full conditional posterior of \(\kappa_\boldsymbol\alpha\) as \[ \begin{align*} p(\kappa_{\boldsymbol\alpha} | \bf y, \bf X, \boldsymbol\alpha, \sigma^2) & \propto p(\kappa_{\boldsymbol\alpha} | \underline s_{\boldsymbol\kappa}, \underline a_{\boldsymbol\kappa}) \cdot p(\boldsymbol\alpha | \kappa_{\boldsymbol\alpha}, \sigma^2)\\ &\qquad\times (\kappa_{\boldsymbol\alpha})^{\underline a_{\boldsymbol\kappa} - 1} \exp \left\{ - \frac{\kappa_{\boldsymbol\alpha}}{\underline s_{\boldsymbol\kappa}} \right\} \cdot \det(\sigma^2 \kappa_{\boldsymbol\alpha} \underline{\bf V}_{\boldsymbol\alpha})^{-\frac{1}{2}} exp \left\{ -\frac{1}{2} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'}(\sigma^2 \kappa_{\boldsymbol\alpha} \underline{\mathbf{V}}_{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\} \\ & \propto (\kappa_{\boldsymbol\alpha})^{\underline a_{\boldsymbol\kappa} - 1} \exp \left\{ - \frac{\kappa_{\boldsymbol\alpha}}{\underline s_{\boldsymbol\kappa}} \right\} \cdot (\sigma^2)^{-\frac{K}{2}} (\kappa_{\boldsymbol\alpha})^{-\frac{K}{2}} \det(\underline{\bf V}_{\boldsymbol\alpha})^{-\frac{1}{2}} exp \left\{ -\frac{1}{2 \kappa_{\boldsymbol\alpha}} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'} (\sigma^2 \underline{\mathbf{V}}_{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha})\right\} \\ & \propto (\kappa_{\boldsymbol\alpha})^{\underline a_{\boldsymbol\kappa} - \frac{K}{2} - 1} \exp \left\{ -\frac{1}{2} \left( (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'} (\sigma^2 \underline{\mathbf{V}}_{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha}) \cdot \frac{1}{\kappa_{\boldsymbol\alpha}} + \frac{2}{\underline s_{\boldsymbol\kappa}} \kappa_{\boldsymbol\alpha} \right) \right\} \end{align*} \] which is a kernel for a Generalised Inverse Gaussian distribution with parameters: \[ \begin{align*} & \overline\lambda = \underline a_{\boldsymbol\kappa} - \frac{K}{2}\\ & \overline\chi = (\boldsymbol\alpha-\underline{\boldsymbol\alpha})^{'} (\sigma^2 \underline{\mathbf{V}}_{\boldsymbol\alpha})^{-1} (\boldsymbol\alpha-\underline{\boldsymbol\alpha}) \\ & \overline\Psi = \frac{2}{\underline s_{\boldsymbol\kappa}} \end{align*} \] Hence, the full-conditional posterior distribution of \(\kappa_\alpha\) follows a Generalised Inverse Gaussian distribution. \[ \kappa_{\boldsymbol\alpha} | \bf y, \bf X, {\boldsymbol\alpha}, \sigma^2 \sim \mathcal{GIG}(\overline\lambda, \overline\chi, \overline\Psi) \] Given the full conditional posterior of both \(\kappa_\alpha\) and \((\boldsymbol\alpha, \sigma^2)\), we can implement the Gibbs Sampler with the following steps:

  • Initialise \(\kappa_\alpha^{(0)}\).
  • At each iteration \(s\):
    • Step 1: Draw \((\boldsymbol\alpha,\sigma^2)^{(s)} \sim p(\boldsymbol\alpha,\sigma^2|\mathbf{X},\mathbf{y},\kappa_{\boldsymbol\alpha}^{(s-1)}) = \mathcal{NIG}2(\overline{\boldsymbol\alpha},\overline{\mathbf{V}},\overline{s},\overline{\nu})\).
    • Step 2. Draw \(\kappa_{\boldsymbol\alpha}^{(s)} \sim p(\kappa_{\boldsymbol\alpha}|\mathbf{X},\mathbf{y},(\boldsymbol\alpha,\sigma^2)^{(s)}) = \mathcal{GIG}(\lambda, \chi, \Psi)\).

The Gamma.sampler function below implements the Gibbs Sampler using the aforementioned algorithm.

Gamma.sampler = function(S, posterior, priors, kappa.priors){
  set.seed(12345)
  S.burnin = 100
  S.total = S + S.burnin
  
  #create matrices to store posterior draws
  alpha.posterior  = array(NA,c(k,1,S.total))
  kappa.posterior  = array(NA,c(1,S.total))
  #initial value for kappa
  kappa.posterior[1] = 10
  
  # Draw sigma^2 from IG2 outside the loop as it does not update iteratively 
  Sigma2.posterior = posterior$S / rchisq(n=S.total,df=posterior$nu)
  
  for (i in 1:(S.total)) {
    # Plug in current kappa to update Sigma (or V in slides)
    prior.Sigma.inv = solve(kappa.posterior[i] * priors$Sigma)
    Sigma.inv = t(X) %*% X + prior.Sigma.inv
    Sigma = solve(Sigma.inv)
    
    # Draw alpha from normal dist using sigma^2 and updated var-covar matrix Sigma
    
    alpha.posterior[,,i]  = t(mvtnorm::rmvnorm(n=1, mean = posterior$alpha, sigma = Sigma2.posterior[i] * Sigma))
    
    # Update parameters for kappa posterior
    lambda = kappa.priors$a - k/2
    chi    = t(alpha.posterior[,,i] - priors$alpha) %*% solve(Sigma2.posterior[i] * priors$Sigma) %*% (alpha.posterior[,,i] - priors$alpha)
    Psi    = 2 / kappa.priors$s
    
    # Draw next period value for kappa from GIG distribution
    if (i != S.total){
      kappa.posterior[i+1] = GIGrvg::rgig(n=1, lambda = lambda, chi = chi, psi = Psi)
    }
  }
  
  # save output as a list 
  list(
    alpha.posterior = alpha.posterior[,,((S.burnin+1):S.total)],
    Sigma2.posterior = Sigma2.posterior[((S.burnin+1):S.total)],
    kappa.posterior = kappa.posterior[((S.burnin+1):S.total)]
  )
}
GammaScale.draws = Gamma.sampler(S = 1000, post, priors, kappa.priors)

I compute and display the posterior means for \(({\boldsymbol\alpha, \sigma^2, \kappa_{\boldsymbol\alpha}})\).

alpha.posterior.mean = rowMeans(GammaScale.draws$alpha.posterior)
alpha.posterior.mean
[1] -0.05634373  0.99357623
Sigma2.posterior.mean = mean(GammaScale.draws$Sigma2.posterior)
Sigma2.posterior.mean
[1] 1.080411
kappa.posterior.mean = mean(GammaScale.draws$kappa.posterior)
kappa.posterior.mean
[1] 0.2408404

Estimating error term variance prior scale

In this section, we estimate the error term variance prior scale \(\underline{s}\) that follows a Gamma distribution \(G(\underline{s}_s,\underline{a}_s)\) with scale \(\underline{s}_s\) shape \(\underline{a}_s\) that follow a probability density function equal to: \[p(\underline{s})=\Gamma(\underline{a}_s)^{-1}\underline{s}_s^{-\underline{a}_s}\underline{s}^{\underline{a}_s-1}\exp\left\{-\frac{\underline{s}}{\underline{s}_s}\right\}\] In order to find our full conditional posterior of \(\underline{s}\) write out its kernel as: \[\begin{align} p(\underline{s}|y,X,\alpha,\sigma^2) &\propto L(y|X,\alpha,\sigma^2) \times p(\alpha|\sigma^2) \times p(\sigma^2|\underline{s}) \times p(\underline{s})\\ &\propto p(\sigma^2|\underline{s}) \times p(\underline{s})\\ &\propto \underline{s}^{\frac{\underline{\nu}}{2}} \exp\left\{-\frac{1}{2}\frac{\underline{s}}{\sigma^2}\right\} \underline{s}^{\underline{a}_s-1}\exp\left\{-\frac{\underline{s}}{\underline{s}_s}\right\}\\ &\propto \underline{s}^{\frac{\underline{\nu}}{2}+\underline{a}_s-1}\exp\left\{-\frac{\underline{s}}{ \left[\left(2\sigma^2\right)^{-1} + \underline{s}_s^{-1}\right]^{-1} } \right\} \end{align}\] from which we recognise a Gamma function \(G(\overline{a}_s, \overline{s}_s)\) with parameters: \[\begin{align} \overline{a}_s&=\frac{\underline{\nu}}{2}+\underline{a}_s\\ \overline{s}_s&=\left[\left(2\sigma^2\right)^{-1} + \underline{s}_s^{-1}\right]^{-1} \end{align}\]

In order to obtain a sample from the posterior distribution we use a Gibbs sampler. We generate random draws from the joint posterior distribution and we update them at each iteration. In this case we exploit the following procedure:

Initialize \(\underline{s}\) at \(\underline{s}^{(0)}\)

At each iteration s:

  1. Draw \((\boldsymbol\alpha,\sigma^2)^{(s)} \sim p\left(\boldsymbol\alpha,\sigma^2 \mid \mathbf{y},\mathbf{X}, \boldsymbol\alpha^{(s-1)}, {\sigma^2}^{(s-1)}, \underline{s}^{(s)}\right)\)
  2. Draw \(\underline{s}^{(s)} \sim p(\underline{s}\mid\mathbf{y},\mathbf{X},\boldsymbol\alpha,\sigma^2)\)

Repeat steps 1 and 2 for \((S_1 + S_2)\) times. Discard the first \(S_1\) repetitions. Return the output as \(\left \{ \boldsymbol\alpha^{(s)}, \sigma^{2(s)}, \underline{s}^{(s)}\right \}^{S_1+S_2}_{s=S_1+1}\).

The following script illustrates sampling from the full conditional posterior distribution of \(\underline{s}\).

nu = 1
sigma2 = 10

# define the prior hyper-parameters
s.prior.s  <- 0.01
a.prior.s  <- 1

#define the posteriors
a.bar.s <- (nu / 2) + s.prior.s
s.bar.s <- 1 / (1/(2 * sigma2) + (1 / s.prior.s))

#sample from the gamma distribution using rgamma function
s.sigma <- rgamma(1, shape = a.bar.s, scale = s.bar.s)

Dummy observation prior

In this extended model, the system is augmented using the sum of coefficients and dummy initial observation prior. The idea for this way of setting a prior is to generate artificial new rows that augment data matrices \(\bf y\) and \(\bf X\). It is equivalent to assuming a normal-inverted gamma 2 prior whose parameters are determined by dummy observations augmenting the system.

The sum of coefficients prior takes the average of the lagged values and adds them to the basic equation. This is because these values are assumed to be a good forecast of future observations. The dummy initial observation prior adds a single dummy observation such that all values are set equal to the averages of initial conditions, up to a scaling factor.

In order to practically generate the additional rows the following steps should be taken.

Firstly, the average of lagged values needs to be calculated, y.bar. This is done by finding the mean of the first \(p\) values in the data. Next, values of of the scaling factors need to be selected, where values equal to 1 are chosen typically.

Once this has been done, the rows can be created. In the univariate case this is simple, two rows are added to vector \(\bf y\), the first equal to y.bar divided by the first scaling factor, \(\lambda_3\), the second equal to th same value divided by the second scaling factor, \(\lambda_4\).

Adding rows to \(\bf X\) is slightly more complicated. The additional rows are as follows. The values in the first row, corresponding to the sum of coefficients, are all equal to y.bar divided by \(\lambda_3\), except the last column, which is equal to zero. The values in the second row, corresponding to the dummy initial observation, are all equal to y.bar, except the last column, which is equal to one, al of which are divided by \(\lambda_4\).

The following R function prior.ex() generates vectors and matrices by which \(\bf y\) and \(\bf X\) should be extended.

prior.ex <- function(data, p = 1, lambda_3 = 1, lambda_4 = 1){
  
  N = 1
  M = p + 1
  
  y.bar     = mean(data[1:p])
    
  Y_star    = rbind((y.bar)/lambda_3, y.bar/lambda_4)
  X_star    = as.matrix(c(rep(0, N), 1/lambda_4))
  for (i in 1:p) {
    X_star  = cbind(Y_star, X_star)
  }
  
  ext.data <- list(YN = Y_star, XN = X_star)
  
  return(ext.data)
  
  
}

Model Extensions

Student-\(t\) error term

A possible extension to the model is to relax the assumption of normally distributed errors. By assuming a t-distribution for the errors we can better account for excess kurtosis in the data, also referred to as ‘fat tails’. Such a specification lends itself well to estimating models on data commonly found to be leptokurtic, such as financial time series.

\[ u_t \sim \mathcal{t}(0,\Sigma,\nu) \]

Scale Mixture of Normals Representation of the \(t\) Distribution

Implementation of t-distributed errors can be achieved by representing the t-distribution as a ‘scale mixture of normal distributions’.

If, \[ \begin{align*} x|\lambda &\sim \mathcal{N}(0, \lambda\sigma^2) \\ \lambda &\sim \mathcal{IG2}(s, \nu) \end{align*} \] then \[ x \sim t(0, s, \sigma^2, \nu) \] This result can be derived by expressing the joint distribution of \(x\) and \(\lambda\) as \[ p(x,\lambda) = p(x|\lambda) p(\lambda) \] and then intergrating with respect to \(\lambda\)

\[ \int_0^\infty p(x|\lambda) p(\lambda) d\lambda \]

This integration produces a density function which is proportional to the distribution for a Student-t distribution with \(\nu\) degrees of freedom. Hence, \(x\sim t_\nu\).

Likelihood and Posteriors under Student-t errors

Implementing the scale mixture specification, the conditional distribution of \(y\) now takes the following form

\[ \boldsymbol{y}|\boldsymbol{X},\boldsymbol{\alpha},\sigma^2,\lambda \sim \mathcal{N_T}(\boldsymbol{X}\boldsymbol{\alpha}, \sigma^2\lambda I_T) \]

The likelihood under this specification now becomes

\[ L(\boldsymbol{\alpha}, \sigma^2, \lambda | \boldsymbol{y}, \boldsymbol{X}) = (2\pi)^{-\frac{T}{2}} (\sigma^2)^{-\frac{T}{2}} \det(\lambda I_T)^{-\frac{1}{2}} exp(-\frac{1}{2}\frac{1}{\sigma^2}(\boldsymbol{y - X\alpha})'(\lambda I_T)^{-1}(\boldsymbol{y - X\alpha}))\]

As before, we can derive the full conditional posteriors of the parameters by multiplying the likelihood and the priors. The natural conjugacy of \(\boldsymbol\alpha\) and \(\sigma^2\) are preserved under this specification and therefore the joint posterior is given by

\[ p(\boldsymbol{\alpha}, \sigma^2|\boldsymbol{y, X}, \lambda) \propto L(\boldsymbol{\alpha}, \sigma^2, \lambda | \boldsymbol{y}, \boldsymbol{X}) p(\boldsymbol{\alpha}, \sigma^2) \]

Expanding and rearranging the resulting expression allows the posterior to be expressed in the form of a Normal Inverse Gamma 2 with the following moments

\[\begin{align*} p(\boldsymbol{\alpha}, \sigma^2|\boldsymbol{\mathbf{y}, \mathbf{X}}, \lambda) &= \mathcal{NIG2}(\boldsymbol{\overline\alpha, \overline V_\alpha}, \overline s, \overline \nu) \\ \overline{\boldsymbol\alpha} &= \overline{\mathbf{V}}_{\boldsymbol\alpha}( \underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\underline{\boldsymbol\alpha}+\mathbf{X}'(\lambda I_T)^{-1}\mathbf{y})\\ \overline{\mathbf{V}}_{\boldsymbol\alpha} &=\left(\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}+\mathbf{X}'(\lambda I_T)^{-1}\mathbf{X}\right)^{-1} \\ \overline{s} &= \underline{s}+\overline{\boldsymbol\alpha}'\overline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\overline{\boldsymbol\alpha}+\underline{\boldsymbol\alpha}'\underline{\mathbf{V}}^{-1}_{\boldsymbol\alpha}\underline{\boldsymbol\alpha}+\mathbf{y}'(\lambda I_T)^{-1}\mathbf{y}\\ \overline{\nu} &= \underline{\nu}+T \end{align*}\]

The full conditional for \(\lambda\) is also derived in a similar manner, with the posterior distribution taking the form of an Inverse Gamma 2

\[\begin{align*} p(\lambda|\mathbf{y,X},\boldsymbol{\alpha},\sigma^2) &\propto L(\boldsymbol{\alpha}, \sigma^2, \lambda | \mathbf{y, X})p(\lambda) \\ &\propto (\sigma^2)^{-\frac{T}{2}}\det(\lambda I_T)^{-\frac{1}{2}}\exp(-\frac{1}{2}\frac{1}{\sigma^2}(\mathbf{y - X\alpha})'(\lambda I_T)^{-1}(\mathbf{y - X\alpha})) \\ &\times \lambda^{-\frac{\underline \nu_\lambda +2}{2}}\exp(-\frac{1}{2}\frac{\underline s_\lambda}{\lambda}) \end{align*}\]

Rearranging the above to express in the form of \(\mathcal{IG2}\)

\[\begin{align*} p(\lambda|\mathbf{y,X},\boldsymbol{\alpha},\sigma^2) &= \mathcal{IG2}(\overline s_\lambda, \overline \nu_\lambda) \\ \overline s_\lambda &= \frac{1}{\sigma^2}(\mathbf{y - X\boldsymbol\alpha})'(\mathbf{y - X\boldsymbol\alpha}) + \underline s_\lambda \\ \overline \nu_\lambda &= T + \underline \nu_\lambda \end{align*}\]

Gibbs Sampler Code for t-distributed errors

Now that we have the full conditionals for all of the parameters we can incorporate them into the Gibbs Sampler routine in order to draw a sample from the posterior densities.

The routine proceeds as follows:

Initialize lambda based on priors for \(\underline s_\lambda\) and \(\underline \nu_\lambda\).

At each iteration s:

  1. Draw \(\sigma^{2(s)}\) from the \(\mathcal{IG2}(\overline S, \overline\nu)\) distribution
  2. Draw \(\mathbf{\alpha^{(s)}}\) from the \(\mathcal{N}(\boldsymbol{\overline \alpha}, \sigma^{2(s)} \overline V)\) distribution
  3. Draw \(\lambda^{(s)}\) from \(\mathcal{IG2}(\overline s_\lambda, \overline\nu_\lambda)\)

The sampling routine is implemented via the following function in R and the results of applying the sampler to the artificial random walk data are displayed. For the purposes of this simulation a relatively large prior for the degrees of freedom parameter is chosen, which communicates our prior belief that the data should follow a random walk.

TDist.Gibbs.sampler = function(S, Y, X, priors){
  # A function to sample from the full conditionals of alpha, sigma and lambda
  #
  # priors: a list containing the usual priors for alpha, Sigma, S and nu
  # as well as additional priors for lambda hyper-parameters lambda_s and lambda_nu.
  
  A.prior = priors$alpha
  A_V.gprior = priors$Sigma
  Sigma_s.prior = priors$S
  Sigma_v.prior = priors$nu
  lambda_s.prior = priors$lambda_s
  lambda_v.prior = priors$lambda_nu
  
  lambda.draw = lambda_s.prior/rchisq(1, lambda_v.prior)
  
  Sigma.posterior.draws = array(NA, c(1,1,S))
  A.posterior.draws = array(NA, c(k,1,S))
  lambda.posterior.draws = rep(NA,S)
  for (s in 1:S){
    
    lambda.gprior.diag = diag(lambda.draw, nrow(Y))
    
    A_V.posterior     = solve(t(X)%*%diag(1/diag(lambda.gprior.diag))%*%X + solve(A_V.gprior))
    A.posterior       = A_V.posterior%*%(t(X)%*%diag(1/diag(lambda.gprior.diag))%*%Y + solve(A_V.gprior)%*%A.prior)
    Sigma_s.posterior = t(Y)%*%diag(1/diag(lambda.gprior.diag))%*%Y + t(A.prior)%*%solve(A_V.gprior)%*%A.prior + Sigma_s.prior - t(A.posterior)%*%solve(A_V.posterior)%*%A.posterior
    Sigma_v.posterior = nrow(Y) + Sigma_v.prior
    
    #Sigma.inv.draw      = rWishart(1, Sigma_v.posterior, solve(Sigma_s.posterior))[,,1]
    #Sigma.posterior.draws[,,s] = solve(Sigma.inv.draw)
    
    Sigma.posterior.draws[,,s] = Sigma_s.posterior/ rchisq(1, df=Sigma_v.posterior)
    Sigma.inv.draw = solve(Sigma.posterior.draws[,,s])
    
    A.posterior.draws[,,s] = matrix(mvtnorm::rmvnorm(1, mean=as.vector(A.posterior), sigma=Sigma.posterior.draws[,,s]%x%A_V.posterior), ncol=2)
    
    lambda_s.posterior = sum(diag(Sigma.inv.draw%*%t(Y - X%*%A.posterior.draws[,,s])%*%(Y - X%*%A.posterior.draws[,,s]))) + lambda_s.prior
    lambda_v.posterior = nrow(Y)*2 + lambda_v.prior
    
    lambda.draw = lambda_s.posterior / rchisq(1, lambda_v.posterior)
    lambda.posterior.draws[s] = lambda.draw
  }
  return(list(A.posterior.draws, Sigma.posterior.draws, lambda.posterior.draws))
}

tdist.res = TDist.Gibbs.sampler(1000, Y, X, c(priors, list(lambda_s=30, lambda_nu=30)))

#Alpha posterior mean:
round(apply(tdist.res[[1]], 1:2, mean),2)
      [,1]
[1,] -0.06
[2,]  1.00
#Sigma posterior mean:
round(mean(tdist.res[[2]]),2)
[1] 4.43
#Lambda posterior mean:
round(mean(tdist.res[[3]]),2)
[1] 0.25

Estimating autoregressions after 2020

The 2020 COVID-19 pandemic significantly altered the global economic landscape. It may be argued that the pre and post COVID periods are not easily comparable, or that the most severe changes for COVID would best be discounted due to their warping effect on the overall data. However, Lenza and Primiceri (2022) disagree. In their paper How to estimate a vector autoregression after March 2020, rather than discount data, they suggest that the effects of COVID be modeled as a temporary spike in volatility. They found that the first three periods of the pandemic are where volatility is the most unpredictable and this finding holds regardless of whether monthly or quarterly data is being used. For higher frequency data the exact effect is unknown though should cover at least the first 3 months of the pandemic. Thus, they propose the following change to the standard formula for autoregressions:

\[y_t = \mathbf{x}_t'\boldsymbol\alpha + c_tu_t,\]

where \(c_t\) is a standard deviation multiplier. That is, for every period before COVID it takes a value of 1, for the first 3 periods of COVID it takes values \(\bar{c}_0\), \(\bar{c}_1\), \(\bar{c}_2\), and then in all future periods a value of \[c_{t*+j} = 1 + (\bar{c}_2 - 1)\rho^{j-2}, \] where \(\rho\in(0,1)\) captures the exponential decay in the value of the conditional standard deviation towards the value one. This creates a vector that leaves the error term unchanged before COVID, has a surge in volatility during COVID, and then decays geometrically after COVID. This structure approximates the observed shocks to volatility and facilitates straightforward estimation.

By dividing both sides by \(c_t\) the model equation can be rewritten as \[\tilde{y}_t = \tilde{\mathbf{x}}_t'\boldsymbol\alpha + u_t\] where \(\tilde{y}_t = y_t/c_t\) and \(\tilde{\mathbf{x}}_t = \mathbf{x}_t/c_t\). These simple transformations then lend themselves to analysis using whatever estimation method is preferred.

The main difficulty arises in the estimation of \(c_t\) as it is an unknown parameter in many cases.

Methodology

To estimate the values for \(c_0\), \(c_1\), and \(c_2\) define a \(T\)-vector \(\mathbf{c}\) of COVID volatility variables:

\[\mathbf{c}=\begin{bmatrix}1&\dots& c_0 & c_1 & c_2 & 1+(c_2-1)\rho & 1+(c_2-1)\rho^2&\dots\end{bmatrix}'\]

Intuitively, volatility variable for the period before COVID is set to unity. Heightened volatility during the first three quarters of COVID are parameterized, then they are assumed to decay at a geometric rate beginning at the fourth quarter since the pandemic’s onset.

These COVID volatility parameters are collected in a vector \(\boldsymbol\theta=(c_0\quad c_1\quad c_2\quad\rho)\) and are estimated from their own marginal posterior as proposed by Lenza and Primiceri (2022):

\[p(\boldsymbol\theta\mid\mathbf{y},\mathbf{X})\propto P(\mathbf{y}\mid\mathbf{X},\boldsymbol\theta)p(\boldsymbol\theta)\]

where the likelihood function is given as:

\[\begin{align}p(\mathbf{y},\mathbf{X}|\boldsymbol\theta) &\propto \Bigg(\prod^T_{t=1}c_t^{-N}\Bigg)\underline{\mathbf{s}}^{\frac{\underline{\nu}}{2}}\det\left(\underline{\mathbf{V}}\right)^{-\frac{1}{2}}\det\left(\tilde{\mathbf{X}}'\tilde{\mathbf{X}}+\underline{\mathbf{V}}^{-1}\right)^{-\frac{1}{2}}\\ &\det\left(\underline{\mathbf{s}}+\hat{\tilde{\mathbf{u}}}'\hat{\tilde{\mathbf{u}}}+(\hat{\tilde{\boldsymbol\alpha}}-\underline{\boldsymbol\alpha})'\underline{\mathbf{V}}^{-1}(\hat{\tilde{\boldsymbol\alpha}}-\underline{\boldsymbol\alpha})\right)^{-\frac{T-p+\underline{\nu}}{2}},\end{align}\] where \(\tilde{\mathbf{X}} = \begin{bmatrix}\tilde{\mathbf{x}}_1' & \dots & \tilde{\mathbf{x}}_T' \end{bmatrix}'\), \(\tilde{\mathbf{y}} = \begin{bmatrix}\tilde{y}_1 & \dots & \tilde{y}_T \end{bmatrix}'\), \(\hat{\tilde{\mathbf{u}}} = \tilde{\mathbf{y}} - \tilde{\mathbf{X}}\hat{\tilde{\boldsymbol\alpha}}\), and \(\hat{\tilde{\boldsymbol\alpha}}=(\tilde{\mathbf{X}}'\tilde{\mathbf{X}}-\underline{\mathbf{V}}^{-1})^{-1}(\tilde{\mathbf{X}}'\tilde{\mathbf{y}}-\underline{\mathbf{V}}^{-1}\underline{\boldsymbol\alpha})\).

and the priors are assumed to be: \[\begin{align} \mathbf{c}_0,\mathbf{c}_1,\mathbf{c}_2 &\sim \mathcal{P}areto(1,1)\\ \rho&\sim \mathcal{B}eta(3,1.5) \end{align}\]

Algorithm with code

This first step is to sample draws of the vector of COVID volatility variables \(\textbf{c}\), which is done by estimating parameters in \(\boldsymbol\theta=(\bar{c}_0,\bar{c}_1,\bar{c}_2,\rho)\) by sampling from their own marginal posterior via a Metropolis MCMC algorithm:

  1. Initialize \(\boldsymbol\theta\) at the posterior mode which is can be located via numerical optimization. We executed this in R through the following function, c.posterior.mode. This function takes in data representing matrix \(\textbf{y}\) and returns values for \(\boldsymbol\theta\) that minimize its negative log-posterior, as well as the corresponding Hessian.
c.posterior.mode <- function(data, p=4, k1=1, k2=100, start_date=c(1991,1)){

  v.neglogPost <- function(theta){
    N = ncol(data)
    K = 1 + p*N
    Y       = ts(data[(p+1):nrow(data),], start=start_date, frequency=4)
    T = nrow(Y)
    X       = matrix(1,T,1)
    # nrow(X)
    for (i in 1:p){
      X     = cbind(X,data[(p+1):nrow(data)-i,])
    }
    
    # Calculate MLE for prior 
    ############################################################
    A.hat       = solve(t(X)%*%X)%*%t(X)%*%Y
    Sigma.hat   = t(Y-X%*%A.hat)%*%(Y-X%*%A.hat)/nrow(Y)
    
    # Specify prior distribution
    ############################################################
    kappa.1     = k1
    kappa.2     = k2
    kappa.3     = 1
    A.prior     = matrix(0,nrow(A.hat),ncol(A.hat))
    A.prior[2:(N+1),] = kappa.3*diag(N)
    V.prior     = diag(c(kappa.2,kappa.1*((1:p)^(-2))%x%rep(1,N)))
    S.prior     = diag(diag(Sigma.hat))
    nu.prior    = N+1
    
    vec <- theta[1:3]
    for (i in 4:12){
      vec <- c(vec, 1 + (theta[3]-1)*theta[4]^(i-3))
    }  
    
    V <- c(ts(rep(1, nrow(Y)-12), c(1991,1), frequency = 4) , vec)    
    
    Y.tilde <- diag(1/V)%*%Y
    X.tilde <- diag(1/V)%*%X
    A.tilde.hat <- solve((t(X.tilde)%*%X.tilde+solve(V.prior)))%*%(t(X.tilde)%*%Y.tilde+solve(V.prior)%*%A.prior)
    epsilon.tilde <-Y.tilde - X.tilde%*%A.tilde.hat
    
    # Log-likelihood      
    logL <- log(prod(V^(-N)))+(-N/2)*log(det(t(X.tilde)%*%X.tilde+solve(V.prior)))+
            (-(T-p+nu.prior)/2)*log(det(S.prior +t(epsilon.tilde)%*%epsilon.tilde + 
            t(A.tilde.hat-A.prior)%*%solve(V.prior)%*%(A.tilde.hat-A.prior)))
    
    # Pareto(1,1) and Beta(3,1.5) priors 
    pareto.a=1
    pareto.b=1
    beta.a=3
    beta.b=1.5
    beta.cons <- 1/beta(beta.a,beta.b)
    
    # Log-prior
    logP <- log((pareto.a*pareto.b^pareto.a)/(theta[1]^(pareto.a+1))*
    (pareto.a*pareto.b^pareto.a)/(theta[2]^(pareto.a+1))*
    (pareto.a*pareto.b^pareto.a)/(theta[3]^(pareto.a+1))*
    beta.cons*theta[4]^(beta.a-1)*(1-theta[4])^(beta.b-1))
    
    # negative log-posterior
    neglogPost <- -(logL+logP)
    
    return(neglogPost)
  }
   
  # numerically minimize the negative log-likelihood
  post.maximizer <- optim(par=c(50, 50, 50, 0.5), fn=v.neglogPost, method="L-BFGS-B", 
                          lower=c(1, 1, 1, 0.0001),
                          upper=c(100,100,100,0.99999), hessian = TRUE)
  
  return(list(maximizer=post.maximizer$par, hessian=post.maximizer$hessian))

}
  1. Draw candidate \(\boldsymbol\theta^{*}\) from \(N_4(\boldsymbol\theta^{(s-1)},cW)\), where \(W\) is the inverse Hessian of the negative log posterior of \(\boldsymbol\theta\) at the mode, which is also calculated computationally, and \(c\) is a scaling factor.

  2. Set:

\[\boldsymbol\theta^{(s)}= \begin{cases} \boldsymbol\theta^* & \text{with pr.} \quad \alpha^{(s)} \\ \\ \boldsymbol\theta^{(s-1)} & \text{with pr.} \quad 1-\alpha^{(s)} \end{cases}\]

\[\alpha^{(s)} =\text{min}\Big[1,\frac{p(\boldsymbol\theta^*|Y,X,\underline{\gamma})}{p(\boldsymbol\theta^{(s-1)}|Y,X,\underline{\gamma})}\Big]\]

  1. Define \(\textbf{c}^{(s)}\) matrix using \(\boldsymbol\theta^{(s)}\):

\[\textbf{c}^{(s)}=[1\quad...\quad1 \quad \bar{c}_0^{(s)}\quad \bar{c}_1^{(s)}\quad \bar{c}_2^{(s)}\quad 1+(\bar{c}_2^{(s)}-1)\rho^{(s)}\quad 1+(\bar{c}_2^{(s)}-1)\rho^{(s)2}\quad...]'\]

Steps 2 to 4 are implemented via the function, mh.mcmc which takes in data, the posterior mode of \(\boldsymbol\theta\), and the inverse Hessian from c.posterior.mode and returns the draws of \(\boldsymbol\theta\).

library(MASS)
library(coda)
mh.mcmc <- function(data, p=1, S.mh = 1000, c, W = diag(4), theta.init,
                    k1 = 1, k2 = 100, start_date = c(1991,1)){
 # N = no. of variables
  N = ncol(data)
  # p = no. of lags
  K = 1 + p*N
  # forecast horizon
  # h       = 8
  Y       = ts(data[(p+1):nrow(data),], start=start_date, frequency=4)
  T = nrow(Y)
  X       = matrix(1,T,1)
  # nrow(X)
  for (i in 1:p){
    X     = cbind(X,data[(p+1):nrow(data)-i,])
  }
  

  # Calculate MLE for prior 
  ############################################################
  A.hat       = solve(t(X)%*%X)%*%t(X)%*%Y
  Sigma.hat   = t(Y-X%*%A.hat)%*%(Y-X%*%A.hat)/nrow(Y)
  
  # Specify prior distribution
  ############################################################
  kappa.1     = k1
  kappa.2     = k2
  kappa.3     = 1
  A.prior     = matrix(0,nrow(A.hat),ncol(A.hat))
  A.prior[2:(N+1),] = kappa.3*diag(N)
  V.prior     = diag(c(kappa.2,kappa.1*((1:p)^(-2))%x%rep(1,N)))
  S.prior     = diag(diag(Sigma.hat))
  nu.prior    = N+1
  
  # Metropolis-Hastings 
  ###########################################################
  # v0, v1, v2, rho
  Theta <- matrix(NA,S.mh,4)
  theta_old <- theta.init
  
  set.seed(1)
  for (s in 1:S.mh){

    covid.vec <- function(theta){
      vec <- theta[1:3]
      for (i in 4:12){
        vec <- c(vec, 1 + (theta[3]-1)*theta[4]^(i-3))
      }
      
      return(vec)
    }

    # Covid volatility likelihood kernel
    v.logL <- function(V){
      Y.tilde <- diag(1/V)%*%Y
      X.tilde <- diag(1/V)%*%X
      A.tilde.hat <- solve((t(X.tilde)%*%X.tilde+solve(V.prior)))%*%(t(X.tilde)%*%Y.tilde+solve(V.prior)%*%A.prior)
      epsilon.tilde <-Y.tilde - X.tilde%*%A.tilde.hat

      logL <- log(prod(V^(-N)))+(-N/2)*log(det(t(X.tilde)%*%X.tilde+solve(V.prior)))+
              (-(T-p+nu.prior)/2)*log(det(S.prior +t(epsilon.tilde)%*%epsilon.tilde + 
              t(A.tilde.hat-A.prior)%*%solve(V.prior)%*%(A.tilde.hat-A.prior)))

      return(logL)
    }
  
    # Covid volatility prior
    v.logP <- function(theta, pareto.a=1, pareto.b=1, beta.a=3, beta.b=1.5){
      beta.cons <- 1/beta(beta.a,beta.b)
  
      logP <- log((pareto.a*pareto.b^pareto.a)/(theta[1]^(pareto.a+1))*
      (pareto.a*pareto.b^pareto.a)/(theta[2]^(pareto.a+1))*
      (pareto.a*pareto.b^pareto.a)/(theta[3]^(pareto.a+1))*
       beta.cons*theta[4]^(beta.a-1)*(1-theta[4])^(beta.b-1))
      
      return(logP)
    }

    v_ones <- ts(rep(1, nrow(Y)-12), c(1991,1), frequency = 4) 
    V.old <- c(v_ones, covid.vec(theta_old))    
      
    # New candidate parameters values
    theta_new <- mvrnorm(1, theta_old, c*W)
    V.new <- c(v_ones, covid.vec(theta_new))
    
    # Calculate posteriors 
    v.logpost_old <- v.logL(V.old)+v.logP(theta_old)
    v.logpost_new <- v.logL(V.new)+v.logP(theta_new)
    
    # Posterior ratio
    post.ratio <- exp(v.logpost_new-v.logpost_old)
    
    # Acceptance/rejection alpha
    alpha <- min(1, post.ratio)
    
    u_star <- runif(1)
    
    if (alpha > u_star){
      Theta[s,] <- theta_new
    } else {Theta[s,] <- theta_old}
    
    theta_old <- Theta[s,]  
  }
  
  colnames(Theta) <- c("c0", "c1" , "c2", "rho")

  re <- list(Theta=Theta, 
             AcceptRate = 1 - rejectionRate(as.mcmc(Theta[,1])))
  return(re)
}
  1. These \(\boldsymbol\theta\) draws, are then used to draw \(\boldsymbol{\alpha}\) and \(\sigma^2\) from the following posterior distribution:

\[p(\boldsymbol{\alpha}\mid\mathbf{y},\mathbf{X},\sigma^2,\textbf{c})=\mathcal{N}_{k}(\bar{\boldsymbol{\alpha}},\sigma^2,\bar{\mathbf{V}})\] \[p(\sigma^2|\mathbf{y},\mathbf{X},\textbf{c})=\mathcal{IG}2(\bar{s},\bar{\nu})\\\]

\[\bar{\mathbf{V}}=(\mathbf{X}'\text{diag}(\textbf{c}^2)^{-1}\mathbf{X}+\underline{\mathbf{V}}^{-1})^{-1}\] \[\bar{\boldsymbol{\alpha}}=\bar{\mathbf{V}}(\mathbf{X}'\text{diag}(\textbf{c}^2)^{-1}\mathbf{y}+\underline{\mathbf{V}}^{-1}\underline{\boldsymbol{\alpha}})\] \[\bar{\nu}=T+\underline{\nu}\] \[\bar{s}=\underline{s}+\mathbf{y}'\text{diag}(\textbf{c}^2)^{-1}\mathbf{y}+\underline{\boldsymbol{\alpha}}'\underline{\mathbf{V}}^{-1}\underline{\boldsymbol{\alpha}}-\bar{\boldsymbol{\alpha}}'\bar{\mathbf{V}}^{-1}\bar{\boldsymbol{\alpha}}\]

This sampling is implemented via the function covid.est below:

covid.est <- function(data, p=4, S=100, k1=1, k2=100, start_date = c(1991,1), Theta.mh){

  N = ncol(data)
  K = 1 + p*N

  Y       = ts(data[(p+1):nrow(data),], start=start_date, frequency=4)
  T = nrow(Y)
  X       = matrix(1,T,1)

  for (i in 1:p){
    X     = cbind(X,data[(p+1):nrow(data)-i,])
  }
  
  covid.vec <- function(theta){
    vec <- theta[1:3]
    for (i in 4:12){
      vec <- c(vec, 1 + (theta[3]-1)*theta[4]^(i-3))
    }
      
    return(vec)
  }
  
  diagV.sqinv <- array(NA, c(nrow(Y),nrow(Y),S))
  
  for (s in 1:S){
    v_ones <- ts(rep(1, nrow(Y)-12), c(1991,1), frequency = 4) 
    diagV.sqinv[,,s] <- diag(c(v_ones, covid.vec(Theta.mh[s,]))^(-2))
  }
  
  # Calculate MLE for prior 
  ############################################################
  A.hat       = solve(t(X)%*%X)%*%t(X)%*%Y
  Sigma.hat   = t(Y-X%*%A.hat)%*%(Y-X%*%A.hat)/nrow(Y)

  # Specify prior distribution
  ############################################################
  kappa.1     = k1
  kappa.2     = k2
  kappa.3     = 1
  A.prior     = matrix(0,nrow(A.hat),ncol(A.hat))
  A.prior[2:(N+1),] = kappa.3*diag(N)
  V.prior     = diag(c(kappa.2,kappa.1*((1:p)^(-2))%x%rep(1,N)))
  S.prior     = diag(diag(Sigma.hat))
  nu.prior    = N+1
  
  # Posterior draws 
  ############################################################
  Sigma.posterior   = array(NA,c(N,N,S))
  A.posterior       = array (NA,c(K,N,S))
  
  for (s in 1:S){
    V.bar.inv   = t(X)%*%diagV.sqinv[,,s]%*%X + diag(1/diag(V.prior))
    V.bar       = solve(V.bar.inv)
    A.bar       = V.bar%*%(t(X)%*%diagV.sqinv[,,s]%*%Y + diag(1/diag(V.prior))%*%A.prior)
    nu.bar      = nrow(Y) + nu.prior
    S.bar       = S.prior + t(Y)%*%diagV.sqinv[,,s]%*%Y + t(A.prior)%*%diag(1/diag(V.prior))%*%
                  A.prior - t(A.bar)%*%V.bar.inv%*%A.bar
    S.bar.inv   = solve(S.bar)
    L                 = t(chol(V.bar))
    
    # RF posterior draws
    Sigma.posterior[,,s] <- solve(rWishart(1, df=nu.bar, Sigma=S.bar.inv)[,,1])
    cholSigma.s     = chol(Sigma.posterior[,,s])
    A.posterior[,,s]       = matrix(mvrnorm(1,as.vector(A.bar), Sigma.posterior[,,s]%x%V.bar),ncol=N)
    A.posterior[,,s]= A.bar + L%*%A.posterior[,,s]%*%cholSigma.s
  }

  re <- list("A.posterior"=A.posterior, "Sigma.posterior"=Sigma.posterior, "Theta"= Theta.mh)
  return(re)
}

Stochastic volatility heteroskedasticity

Model equations

Introduction of Stochastic Volatility requires changing the assumptions regarding the error term: \[ \mathbf{u}\mid\mathbf{X} \sim N_T(\mathbf{0}_{T},\text{diag}(\boldsymbol\sigma^2)), \] where \(\boldsymbol\sigma^2\) is a \(T\)-vector: \[ \boldsymbol\sigma^2 = (\exp\{h_1\}, ... , \exp\{h_t\}), \] and \(h_t\) follows a Stochastic Volatility process: \[ \begin{align*} h_t &= h_{t-1}+\sigma_vv_t \\ v_t &\sim N(0,1). \end{align*} \]

The likelihood function is then \[ L(\boldsymbol\alpha|\mathbf{y},\mathbf{X},\boldsymbol\sigma^2) \propto \det(\text{diag}(\boldsymbol\sigma^2))^{-1/2}\times \exp\{ -\frac{1}{2} (\mathbf{y}-\boldsymbol\alpha \mathbf{X})' \text{diag}(\boldsymbol\sigma^2)^{-1}(\mathbf{y}-\boldsymbol\alpha \mathbf{X}) \} \] The full conditional posterior distribution for \(\boldsymbol\alpha\) is given by \[ p(\boldsymbol\alpha|\mathbf{y},\mathbf{X},\boldsymbol\sigma^2) = \mathcal{N}_K (\bar{\boldsymbol\alpha},\bar{\mathbf{V}}) \] where \[\begin{align} \bar{\mathbf{V}}&=(\mathbf{X}'\text{diag}(\boldsymbol\sigma^2)^{-1}\mathbf{X}+\underline{\mathbf{V}}^{-1})^{-1}\\ \bar{\boldsymbol\alpha}&=\bar{\mathbf{V}}(\mathbf{X}'\text{diag}(\boldsymbol\sigma^2)^{-1}\mathbf{y}+\underline{\mathbf{V}}^{-1}\underline{\boldsymbol\alpha}) \end{align}\]

Essential techniques

Log-linearisation

In order to estimate the SV model rewrite the autoregressive equation as \[\begin{align} y_t - \mathbf{x}_t'\boldsymbol\alpha &= \exp\left\{\frac{1}{2}h_t\right\}z_t\\ z_t\mid\mathbf{x}_t &\sim\mathcal{N}(0,1) \end{align}\] Square both sides of the equation and take the log to obtain: \[\begin{align} \tilde{y}_t &= h_t + \tilde{z}_t\\ \tilde{z}_t\mid\mathbf{x}_t &\sim\log\chi^2_1 \end{align}\] where \(\tilde{z}_t = \log\left(y_t - \mathbf{x}_t'\boldsymbol\alpha\right)^2\), and \(\tilde{z}_t = \log z_t^2\) that follows the \(\log\chi^2_1\) distribution. This is a non-Gaussian linear model in \(h_t\).

Auxiliary mixture

To deal with the non-Gaussianity, Omori et al. (2007) propose to approximate the log chi-squared distribution with one degree of freedom by a mixture of ten normal distributions given by: \[ log\chi^2_1 \approx \sum^{10}_{m=1}Pr[s_t=m]N(\mu_m, \sigma^2_m) \] where \(s_t\in\{1,...,10\}\) is a discrete-valued random indicator of the mixture component and \(\mu_m, \sigma^2_m, Pr(s_t=m)\) are predetermined.

The conditional distribution of \(\tilde{\epsilon_t}\) is given by \[ \tilde{\epsilon_t}|s_t = m \sim N(\mu_m, \sigma^2_m) \] A simple stochastic volatility model is given by \[ \begin{align*} \tilde{y} &= h + \tilde{\epsilon} \\ Hh &= h_0e_{1.T}+\sigma_vv \\ \tilde{\epsilon_t}|s &\sim N_T(\mu_s, diag(\sigma^2_s)) \\ v &\sim N_T(0_T, I_T) \\ \end{align*} \]

Precision sampler

The full-conditional posterior distribution of the log-volatilities \(h\) is a \(T\)-variate normal distribution with the precision matrix that is a tridiagonal matrix. Therefore, a computationally fast method of sampling from this full-conditional posterior distribution is by the precision sampler introduced by Chan and Jeliazkov (2009). The implementation in R requires using computer routines for tridiagonal matrices from package mgcv by Wood (2020).

Gibbs sampler

Initialize the algorithm by proposing starting values for \(h^{(0)},s^{(0)},\mu_0^{(0)},\boldsymbol\alpha^{(0)},\sigma_v^{2(0)}\)

For each iteration \(i\) of the Gibbs sampling algorithm, draw \[\begin{align} h_{0}^{(i)} &\sim \mathcal{N}(\bar{h_0},\bar{\sigma}^2_h) \\ \sigma_v^{2(i)} &\sim \mathcal{IG}2(\bar{s},\bar{v})\\ \mu_{0}^{(i)} &\sim \mathcal{N}(\bar{\mu_0},\bar{\sigma}^2_\mu) \end{align}\] \[ s_t^{(i)} \sim \mathcal{M}ultinomial (\{m\}_{m=1}^{10}, \{Pr[s_t = m|\tilde{y}_t,h_t^{(i)}\}_{m=1}^{10}) \] using inverse transform method, and: \[ \mathbf{h}^{(i)} \sim \mathcal{N}_T(\bar{\mathbf{h}},\bar{\mathbf{V}}_h) \] using the precision sampler and compute vector \(\boldsymbol\sigma^{2(i)}\).

Then, sample the autoregressive parameters: \[ \boldsymbol\alpha^{(i)} \sim \mathcal{N}_K(\bar{\boldsymbol\alpha},\bar{\mathbf{V}}_\boldsymbol\alpha), \] and return a sample of \(S\) draws from the posterior distribution: \[ \{\mathbf{h}^{(i)},\mathbf{s}^{(i)},\mu_{0}^{(i)},h_0^{(i)}, \sigma_v^{2(i)},\boldsymbol\sigma^{2(i)},\boldsymbol\alpha^{(i)}\}_{i=1}^{S}. \]

Algorithm

Following is a function implementing the Gibbs sampler.

Forecasting

Analytical results for one-period-ahead forecast

In the frequentist approach, we condition the predictive density on the parameters \(\left(\boldsymbol\alpha\right)\), the parameters. The latter are treated as non-random. Therefore, it can be stated that this approach to forecasting ignores estimation uncertainty.

To address this issue, the Bayesian approach treats the parameters \(\boldsymbol\alpha\) and \(\sigma^2\) as random variables and thus, it incorporates estimation uncertainty into the predictive density. Additionally, by imposing appropriate priors Bayesian forecasting often leads to more precise forecasts as it narrows the predictive intervals.

Moreover, Bayesian forecasting uses the conditional predictive density as a starting point. In the following formula for the conditional one-period ahead forecast, the first expression under the integral is the conditional predictive density and the second expression is the joint posterior distribution that we are now used to derive.

\[\begin{align} p\left(y _{t+1} \mid \mathbf{y}, \mathbf{X}\right) &=\iint p\left(y _{t+1} \mid\mathbf{y}, \mathbf{X}, \boldsymbol\alpha, \sigma^2\right) p\left(\boldsymbol\alpha, \sigma^2 \mid \mathbf{y}, \mathbf{X}\right) d \boldsymbol\alpha d \sigma^2\\ p\left(y _{t+1} \mid\mathbf{y}, \mathbf{X}, \boldsymbol\alpha, \sigma^2 \right) &=\mathcal{N} \left(x_{t+1}^{\prime} \boldsymbol\alpha, \sigma^2\right)\\ p\left(\boldsymbol\alpha, \sigma^2 \mid \mathbf{y}, \mathbf{X}\right) &=p\left(\boldsymbol\alpha \mid \mathbf{y}, \mathbf{X}, \sigma^2\right) p\left(\sigma^2 \mid \mathbf{y}, \mathbf{X}\right)\\ p\left(\boldsymbol\alpha \mid \mathbf{y}, \mathbf{X}, \sigma^2 \right) &= \mathcal{N}_{K}\left(\overline{\boldsymbol\alpha}, \sigma^2\overline{\mathbf{V}}\right)\\ p\left(\sigma^2 \mid \mathbf{y}, \mathbf{X}\right) &=\mathcal{IG}2(\overline{s}, \overline{\nu}) \end{align} \] To derive the solution of this expression analytically, we integrate the parameters out in this order:

  1. We integrate out \(\boldsymbol\alpha\) and obtain:

\[ p\left(y_{t+1} \mid \mathbf{y}, \mathbf{X}, \sigma^2\right)=\mathcal{N}\left(\mathbf{x}_{t+1}^{\prime} \overline{\boldsymbol\alpha}, \sigma^2\left(1+\mathbf{x}_{t+1}^{\prime} \overline{\mathbf{V}} \mathbf{x}_{t+1}\right)\right) \]

  1. We integrate out \(\sigma^2\) and obtain the one-period-ahead predictive density given by a Student-\(t\) distribution as defined by Bauwens, Lubrano, and Richard (1999): \[ p\left(y_{t+1} \mid \mathbf{y}, \mathbf{X}\right)=t\left(\mathbf{x}_{t+1}^{\prime} \overline{\boldsymbol\alpha}, 1+\mathbf{x}_{t+1}^{\prime} \overline{\mathbf{V}} \mathbf{x}_{t+1}, \overline{s}, \overline{\nu}\right). \]

Numerical approach to one-period-ahead forecast

The Bayesian approach of the conditional predictive density is a combination of a conditional predictive density and the posterior distribution of the unknown parameters \(\boldsymbol\alpha\) and \(\sigma^{2}\). The one-period ahead conditional predictive density can be expressed as below.

\[ p(y_{t+1} |x_{t+1},\mathbf{y}, \mathbf{X}) = \int\int p(y_{t+1} |x_{t+1}, \mathbf{y}, \mathbf{X}, \boldsymbol\alpha, \sigma^{2})p(\boldsymbol\alpha, \sigma^{2}|\mathbf{y},\mathbf{X}) d\boldsymbol\alpha d\sigma^{2}\]

\[p(y_{t+1} |x_{t+1}, \mathbf{y},\mathbf{X}, \boldsymbol\alpha, \sigma^{2}) = \mathcal{N}( \alpha'_{d}d_{t+1|t} + \alpha_{1}y_{t}+\dots+\alpha_{p}y_{t-p+1}, \sigma^{2})\]

\[p(\boldsymbol\alpha, \sigma^{2}|\mathbf{y},\mathbf{X}) = \mathcal{NIG}2(\overline{\boldsymbol\alpha}, \overline{\sigma^{2}}, \overline{S}, \overline{v}) \]

To sample the joint predictive density, we integrate out \(\boldsymbol\alpha\) and \(\sigma^{2}\). Below steps show how the one-period ahead joint predictive density can be obtained.

Sample draws from the posterior distribution \(p(\boldsymbol\alpha, \sigma^{2}|\bf y,\bf X)\) and

Obtain \(\left\{ \boldsymbol\alpha^{(s)}, \sigma^{2(s)} \right\}^{S}_{s=1}\)

For each draw of \(\boldsymbol\alpha\) and \(\sigma^{2}\) sample from \(p(y_{t+1} |x_{t+1})\) that is specified as \(y_{t+1} \sim \mathcal{N} (x_{t+1|t}\boldsymbol\alpha^{(s)}, \sigma^{2(s)})\)

Obtain \(\left\{ y_{t+1|t}^{(s)} \right\}^{S}_{s=1}\)

Characterise the predictive density using \(\left\{ y_{t+1|t}^{(s)} \right\}^{S}_{s=1}\)

Forecasting many periods ahead

Sampler implementation in R

## Apply function when needed
posterior.sample.draws = posterior.draws(S=50000, Y=Y, X=X)         # using the posterior function to draw alpha and sigma
A.posterior.simu       = posterior.sample.draws$A.posterior         # obtain posterior alpha
Sigma.posterior.simu   = posterior.sample.draws$Sigma.posterior     # obtain posterior sigma

## forecasting setup
h                      = 12                                         # specify the desired number of steps ahead
S                      = 50000                                      # the number of sampling time, no greater than the simulation time in the posterior draw function
Y.h                    = matrix(NA,h,S)                             # create empty matrix to store the h-step ahead Y 

## sampling predictive density
for (s in 1:S){
  A.posterior.draw     = A.posterior.simu[,s]
  Sigma.posterior.draw = Sigma.posterior.simu[,s]
    x.Ti               = Y[(nrow(Y)-p+1):nrow(Y)]                   # the number of lags in Y
    x.Ti               = x.Ti[p:1]
  for (i in 1:h){
    x.T                = c(d,as.vector(t(x.Ti)))                    # d refers to d row vector for the deterministic term data
    Y.f                = rnorm(1, mean = x.T%*%A.posterior.draw, sigma=Sigma.posterior.draw)
      x.Ti             = rbind(Y.f,x.Ti[1:(p-1)])                   # refresh the initial data in Y
    Y.h[i,s]           = Y.f
  }
}

References

Bauwens, Luc, Michel Lubrano, and J. F. Richard. 1999. Bayesian inference in dynamic econometric models. Oxford University Press, USA.
Chan, Joshua, and Ivan Jeliazkov. 2009. “Efficient Simulation and Integrated Likelihood Estimation in State Space Models.” International Journal of Mathematical Modelling and Numerical Optimisation 1: 101–20.
Lenza, Michele, and Giorgio E Primiceri. 2022. “How to Estimate a Vector Autoregression After March 2020.” Journal of Applied Econometrics 37 (4): 688–99.
Omori, Yasuhiro, Siddhartha Chib, Neil Shephard, and Jouchi Nakajima. 2007. “Stochastic Volatility with Leverage: Fast and Efficient Likelihood Inference.” Journal of Econometrics 140 (2): 425–49.
Wood, S. 2020. “Mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation (1.8-33)[computer Software].”

Citation

BibTeX citation:
@online{woźniak2023,
  author = {Woźniak, Tomasz and Loopers Davidsen, Jonas and Dell’Andrea,
    Filippo and Getty, Corwin and Gomez, Ray and Guo, Xiaoman and Huang,
    Manting and Huynh, Nathan and Lee, Eungwon and Kronholm Møller,
    Thomas and Sonnemans, Victoria and Timilsena, Yobin and
    Trueman-Greinke, Django and Zhang, Hanwen},
  title = {Bayesian {Autoregressions}},
  date = {2023-05-25},
  url = {https://donotdespair.github.io/Bayesian-Autoregressions/},
  doi = {10.26188/23255657},
  langid = {en}
}
For attribution, please cite this work as:
Woźniak, Tomasz, Jonas Loopers Davidsen, Filippo Dell’Andrea, Corwin Getty, Ray Gomez, Xiaoman Guo, Manting Huang, et al. 2023. “Bayesian Autoregressions.” May 25, 2023. https://doi.org/10.26188/23255657.