Sunday, March 24, 2013

How to modify a JAGS+rjags (or BUGS) program to include new parameters

A central goal of providing you with lots of examples of programs, that do Bayesian parameter estimation in JAGS (BUGS) and rjags, is for you to be able to expand and modify the programs for your own needs. The book includes several cases in which programs are incrementally modified, but this blog post provides an explicit example of the systematic changes that are needed. We start with a program for linear regression, and expand it to include a parameter for quadratic trend. All the specific changes needed for the inclusion of a new parameter are explicitly pointed out, in every section of the program. This provides a guide for changing other programs to your own needs.

To motivate the present example, here (below) are plots of the data, fit by linear regression (the original program), and by regression that includes a quadratic component (the expanded program).
Notice above that there appears to be a systematic descrepancy between credible linear regression lines and the data. The data seem to have a curvature that is not captured by the linear trend. If we include a quadratic component in the model, the credible trend lines look like this:
(Sorry that the title of the graph still says "lines" instead of "curves".)

My purpose for this blog post is to show you how to expand the program that created the first result above into the program that created the second result above. I will do this simply by providing the two programs, and marking the changed lines in the expanded program with the comment, #CHANGED. The two programs can be found at the usual program repository: The original is here and the modified is here.

Although the programs are moderately long, primarily because of all the graphics commands that come at the end, the number of changes is small. Below, I highlight the changes, section by section.

The crucial conceptual change comes in model statement, which specifies the quadratic component for the trend. The model must be expanded to include the quadratic component. The parts highlighted in yellow were added to the original model specification:
model {
    for( i in 1 : Ndata ) {
        y[i] ~ dnorm( mu[i] , tau )
        mu[i] <- beta0 + beta1 * x[i] + beta2 * pow(x[i],2) # CHANGED
    }
    beta0 ~ dnorm( 0 , 1.0E-12 )
    beta1 ~ dnorm( 0 , 1.0E-12 )
    beta2 ~ dnorm( 0 , 1.0E-12 ) # CHANGED
    tau ~ dgamma( 0.001 , 0.001 )
}
Notice that the trend, specified by mu[i], was expanded to include a new parameter; namely, the beta2 coefficient on the squared data value. Importantly, with the introduction of a new parameter, that parameter must be provided with a prior, as shown in the second highlighted line above.

All that remains to be done after that is to make sure that the rest of the program takes into account the new parameter and model form.

The data section does not need to be changed, because the data are not changed.

The chain-initialization section, if there is one, must be changed to accommodate the new parameter:
initsList = list(
  beta0 = b0Init ,
  beta1 = bInit ,
  beta2 = 0 , # CHANGED
  tau = tauInit
)
I chose to be lazy and initialize the new parameter at zero, rather than doing it more intelligently.

In the run-the-chains section, we need to tell JAGS to keep track of the new parameter:
parameters = c("beta0" , "beta1" , "beta2" , "tau")  # CHANGED
Notice the only change in that whole section was specifying the new parameter to be tracked.

Hopefully you'll find that all those changes mentioned above are pretty straightforward! In general, when you're modifying a JAGS+rjags program, those steps above are the main ones to keep in mind. Here they are, summarized:
  • Carefully specify the model with its (new) parameters.
  • Be sure all the (new) parameters have sensible priors.
  • If you are defining your own initial values for the chains, be sure you've defined intial values for all the (new) parameters.
  • Tell JAGS to track the (new) parameters.
After that, the most effortful part is graphically displaying the results. Here is the change I made for the graphs presented at the beginning of this blog entry:
# Display data with believable regression curves and posterior predictions.
# Plot data values:
xRang = max(x)-min(x)
yRang = max(y)-min(y)
limMult = 0.25
xLim= c( min(x)-limMult*xRang , max(x)+limMult*xRang )
yLim= c( min(y)-limMult*yRang , max(y)+limMult*yRang )
plot( x , y , cex=1.5 , lwd=2 , col="black" , xlim=xLim , ylim=yLim ,
      xlab="X" , ylab="Y" , cex.lab=1.5 ,
      main="Data with credible regression lines" , cex.main=1.33  )
# Superimpose a smattering of believable regression lines:
xComb = seq(xLim[1],xLim[2],length=201)
for ( i in round(seq(from=1,to=chainLength,length=50)) ) {
  lines( xComb , 
         mcmcChain[i,"beta0"] + mcmcChain[i,"beta1"]*xComb 
         + mcmcChain[i,"beta2"]*xComb^2 , # CHANGED 
         col="skyblue" )
}
The set-up of the graph was unchanged; all I did was modify the blue curves that get plotted, so that they correspond to the model that was specified for JAGS. This is perhaps the most dangerous part of modifying JAGS programs: It is easy to modify the model specification for JAGS, but inadvertently not make the identical corresponding change to what gets plotted later! It would be nice if the model were specified only once, in a form that both JAGS and R simultaneously understand, but presently that is not how it's done in the JAGS/BUGS world.

Oh -- and what about interpreting the results in this case? Is there a credible non-zero quadratic trend, and how big is it? Answer: Just look at the posterior distribution on beta2. And, you might ask, how do we examine the results of the linear regression and decide to expand the model? That process is called a posterior predictive check, and my advice about it is provided in this article (n.b., your click on that link constitutes your request to me for a personal copy of the article, and my provision of a personal copy only).

No comments:

Post a Comment