Monday, April 27, 2015

Final days for early registration prices at Stats Camp (including Bayesian course)

Early registration prices end this week for Doing Bayesian Data Analysis, June 1 - 5, 2015, a five-day course offered through Stats Camp convened in Dallas, Texas.

The web page for the June 1-5 workshop is here.

Other workshops are listed here.

Friday, April 24, 2015

Bayesian comparison of groups using Python emcee

Prof. Brain Blais has implemented the BEST model of two groups in emcee, a Python system for MCMC sampling. See his post about it here.

Tuesday, April 14, 2015

Doing Bayesian data analysis again at Bernoulli's grave

The 2nd Edition of the book visited the grave of Jacob Bernoulli (1655-1705) in Basel, Switzerland. Jacob Bernoulli pre-dated Bayes (1701-1761), of course, but Bernoulli established foundational concepts and theorems of probability. The photos below were taken by Marc Sager, who is a student at the University of St. Gallen (where I give a workshop in the summers). Thanks, Marc!

The 1st Edition also visited Bernoulli, as was blogged here. The 1st Edition also visited Bayes tomb and the remains of R. A. Fisher. The book is still waiting to visit Laplace!

If you pose the book with other famous Bayesians, or pre-Bayesians, or anti-Bayesians, either dead or not-quite-yet dead, please send me those photos too! (The goal is to be amusing and informative, not offensive.) Thanks, and have fun!


Sunday, April 12, 2015

Power is of two kinds (or: Gandhi, power, fear, love, and statistics)

In the chapter of DBDA2E on Goals, Power, and Sample Size (p. 384), I quoted the Mahatma Gandhi:
"Power is of two kinds. One is obtained by the fear of punishment and the other by arts of love. Power based on love is a thousand times more effective and permanent than the one derived from fear of punishment."
But I was not able to find an original source for that quote, and I said so in a footnote.

Now the original source has been revealed to me by reader Atul Sharma. (Thank you, Atul!) He even pointed me to an online archive of image scans of the original documents. Here is the relevant page; the passage starts at the bottom of the left column:
The image comes from the Gandhi Heritage Portal. The full reference is Gandhi, M. K. (1925, 08 January). Young India, p. 15.

What did that quote have to do with statistical goals and power? 

Well, I was being playful with the word "power" but there also was a deeper relationship. In classical statistics, "power" refers to the goal of rejecting the null hypothesis. But that goal has problems, and a better goal is seeking precision (and accuracy) of parameter estimation. On p. 384 of DBDA2E I said "The goal of achieving precision thereby seems to be motivated by a desire to learn the true value, or, more poetically, by love of the truth, regardless of what it says about the null value. The goal of rejecting a null value, on the other hand, seems too often to be motivated by fear: fear of not being published or not being approved if the null fails to be rejected. The two goals for statistical power might be aligned with different core motivations, love or fear." Then came the quote from Gandhi.

Thursday, April 9, 2015

Bayes factors for tests of mean and effect size can be very different

In this post, we consider a Bayes factor null-hypothesis test of the mean in a normal distribution, and we find the unintuitive result that the Bayes factor for the mean can be very different than the Bayes factor for the effect size. The reason is that the prior distribution of the standard deviation affects the implied prior on the effect size. Different vague priors on the standard deviation can dramatically change the BF on the effect size.

By contrast, the posterior distributions of the mean and of the effect size are very stable despite changing the vague prior on the standard deviation.

Although I caution against using Bayes factors (BFs) to routinely test null hypotheses (for many reasons; see Ch. 12 of DBDA2E, or this article, or Appendix D of this article), there might be times when you want to give them a try. A nice way to approximate a BF for a null hypothesis test is with the Savage-Dickey method (again, see Ch. 12 of DBDA2E and references cited there, specifically pp. 352-354). Basically, to test the null hypothesis for a parameter, we consider a narrow region around the null value and see how much of the distribution is in that narrow region, for the prior and for the posterior. The ratio of the posterior to prior probabilities in that zone is the BF for the null hypothesis.

Consider a batch of data randomly sampled from a normal distribution, with N=43. We standardize the data and shift them up by 0.5, so the data have a mean of 0.5 and an SD of 1.0. Figure 1, below, shows the posterior distribution on the parameters
Figure 1. Posterior when using unif(0,1000) prior on sigma, shown in Fig's 2 and 3.
First, consider the mu (mean) parameter. From the relation of the 95% HDI and ROPE, we would decide that a value of 0 for mu is not very credible, with the entire HDI outside the ROPE and only 0.7% of the posterior distribution practically equivalent to the null value. For the effect size, a similar conclusion is reached, with the 95% HDI completely outside the ROPE, and only 0.8% of the posterior practically equivalent to the null value. Note that the ROPEs for mu and effect size have been chosen here to be commensurate.

To determine Bayes factors (BFs) for mu and effect size, we need to consider the prior distribution in more detail. It has a broad normal prior on mu with an SD of 100 and a broad uniform prior on sigma from near 0 to 1000, as shown in Figures 2 and 3:

Figure 2. Prior with unif(0,1000) on sigma.  Effect size is shown better in Figure 3.

The implied prior on the effect size, in the lower right above, is plotted badly because of a few outliers in the MCMC chain, so I replot it below in more detail:

Figure 3. Implied prior on effect size for unif(0,1000) prior on sigma.

The BF for a test of the null hypothesis on mu is the probability mass inside the ROPE for the posterior relative to the prior. In this case, the BF is 0.7% / 0.1% (rounded in the displays) which equals about 7. That is, the null hypothesis is 7 times more probable in the posterior than in the prior (or, more carefully stated, the data are 7 times more probable under the null hypothesis than under the alternative hypothesis). Thus, the BF for mu decides in favor of the null hypothesis.

The BF for a test of the null hypothesis on the effect size is the analogous ratio of probabilities in the ROPE for the effect size. The BF is 0.8% / 37.2% which indicates a strong preference against the null hypothesis. Thus, the BF for mu disagrees with the BF for the effect size.

Now we use a different vague prior on sigma, namely unif(0,10), but keeping the same vague prior on mu:
Figure 4. Prior with unif(0,10) on sigma. Effect size is replotted in Figure 5. Compare with Figure 2.

Figure 5. Effect size replotted from Figure 4, using unif(0,10) on sigma.

The resulting posterior distribution looks like this:
Figure 6. Posterior when using unif(0,10) prior on sigma.
Compare the posterior in Figure 6 with the posterior in Figure 1. You will see they are basically identical. In other words, the 95% HDIs have barely changed at all, and decisions based on HDI and ROPE are identical, and the probability statements are identical.

But the BF for effect size is rather different than before. Now it is 0.8% / 0.4%, which is to say that the probability of the null hypothesis has gone up, i.e., this is a BF that leans in favor of the null hypothesis. Thus, a less vague prior on sigma has affected the implied prior on the effect size, which, of course, strongly affected the BF on effect size.

To summarize so far, a change in the breadth of the prior on sigma had essentially no effect on the HDIs of the posterior distribution, but had a big effect on the BF for the effect size while having no effect on the BF for mu.

Proponents of BFs will quickly point out that the priors used here are not well calibrated, i.e., they are too wide, too diluted. Instead, an appropriate use of BFs demands a well calibrated prior. (Proponents of BFs might even argue that an appropriate use of BFs would parameterize differently, focusing on effect size and sigma instead of mu and sigma.) I completely agree that the alternative prior must be meaningful and appropriate (again, see Ch. 12 of DBDA2E, or this article, or Appendix D of this article) and that the priors used here might not satisfy those requirements for a useful Bayes factor.

But there are still two take-away messages:
First, the BF for the mean (mu) need not lead to the same conclusion as the BF for the effect size unless the prior is set up just right.
Second, the posterior distribution on mu and effect size is barely affected at all by big changes in the vagueness of the prior, unlike the BF.