\documentclass[twocolumn,11pt]{article}
\usepackage{epsfig,graphics,latexsym,amsfonts,amssymb,amsmath,verbatim}
\makeatletter % make @ act like a letter
\@addtoreset{equation}{section}
\makeatother % make @ act like a non-letter
\def\theequation{\thesection.\arabic{equation}}
\hoffset=-0.675in
\advance\topmargin by -0.75truein
%\advance\topmargin by .05truein
\oddsidemargin=0.675truein
\evensidemargin=0.675truein
\advance\textheight by 1.25truein
\setlength\textwidth{6.5in}
\vsize=9.0in
\def\doublespace{\baselineskip=20pt minus 1pt}
\begin{document}
\title{Introduction to Power Prior Distributions}
\author{Ming-Hui Chen \\
Department of Statistics \\
University of Connecticut}
\maketitle
\begin{center}
\noindent{\large \bf Abstract}
\end{center}
In this article,
we review a class
of prior distributions called the
{\em Power Prior Distributions}. We discuss the theoretical
properties of these priors in detail.
We also discuss the normal, binomial, and Poisson regression models.
Elicitation of hyperparameters is discussed and several
applications and extensions of the priors
are given.
\medskip
\noindent{\bf Keywords and Phrases:}
Prior Elicitation, Posterior Distribution, Propriety,
Variable Selection.
\section{Introduction}
Prior elicitation is one of the most important
issues in Bayesian data analysis.
When no prior information is available,
a non-informative prior such as a uniform prior, Jeffreys prior,
or reference prior can be used;
see (Yang and Berger, 1997) for a list of such noninformative priors.
However, real prior information such as historical data or data
from previous similar studies is often available
in applied research settings where
the investigator has access to
previous studies
measuring the same response and covariates
as the current study.
For example, in many cancer and AIDS clinical trials,
current studies often use treatments that are very similar
or slight modifications of treatments used in previous studies
(see Ibrahim, Chen, and MacEachern (1996), and Ibrahim and Chen (1997)).
In carcinogenicity studies, large historical databases exist for the
control animals from previous experiments. In experiments conducted
over time, data from previous time periods can often be used
as prior information. We shall generically refer to the data
from a previous study (or studies) as historical data throughout.
In all of these situations, it is natural to incorporate the
historical data into the current study
by quantifying it
with a suitable prior distribution on the model parameters.
\section{Development of Power Prior Distributions}
Suppose that
$\{(x_i,y_i), i=1,2,\dots,n\}$ is a sample of independent
observations from the current study, where each $y_i$ is the response variable,
$x_i=(x_{i1},x_{i2},\dots,x_{ik})^\prime$ is a $k \times 1$ random vector of
covariates with $x_{i1}=1$ denoting an intercept.
We use the generic label $f(u_1 \mid u_2)$ to
denote the conditional density of $u_1$ given $u_2$ throughout, where $u_1$
may be discrete or continuous.
Suppose that given $x_i$, $y_i$
has a density in the exponential class
with the form
\begin{align}
& f(y_{i} \mid x_i,\theta_{i},\tau) \nonumber \\
= & \exp
\left\{
\alpha_i^{-1}({\tau})(y_{i} \theta_{i} - \psi(\theta_{i})) + \phi(y_{i}, \tau) \right\} \ ,
\label{GLM}
\end{align}
for $i = 1 , \ldots , n$,
indexed by the canonical parameter $\theta_{i}$ and the scale parameter
$\tau$. The functions $\psi$ and $\phi$ determine a particular family in
the class, such as the binomial, normal, Poisson, etc. The function
$\alpha_i(\tau)$ is commonly of the form $\alpha_i(\tau) = \tau^{-1}
s_{i}^{-1}$, where the $s_{i}$'s are known weights. Further suppose
the $\theta_{i}$'s satisfy the equations
\begin{equation}
\theta_{i} = h(\eta_{i}) \ , \ i = 1 , \ldots , n \ ,
\label{link}
\end{equation}
and
\begin{equation}
\eta_i = x_i^\prime\beta \ ,
\label{linear}
\end{equation}
where $h$ is a monotone
differentiable function, often referred
to as the link function
and $\beta=(\beta_1,\beta_2,\dots,\beta_k)^\prime$
is a $k \times 1$ vector of regression coefficients.
We consider a few special cases of (\ref{GLM}).
\section{Properties of Power Priors}
\subsection{Roles of Prior Parameters}
When historical data are available,
the power prior
$\pi(\beta,a_0 \mid D_{0})$
is proportional
to the product of the likelihood function
of $\beta$
based on the historical data $D_0$
taken to
the power $a_0$ multiplied by the (beta) prior of $a_0$.
\subsection{Propriety of Power Prior Distributions}
Any informative Bayesian analysis necessarily requires
a proper prior distribution. It is thus critical to examine
the conditions under which
the joint prior for $\beta$ and $a_0$ is proper.
This issue is crucial in
Bayesian variable selection (see, for example,
Chen, Ibrahim, and Yiannoutsos (1996) and
Ibrahim, Chen, and MacEachern (1996), Ibrahim and Chen (1997),
and Ibrahim, Chen, and Ryan (1997)), as it is well known
that Bayesian variable selection requires a proper prior distribution.
It is also an important issue in Bayesian hypothesis
testing problems, and in particular, in
the calculation of Bayes factors
and related quantities (see, for example, Berger, 1985, pp. 145-157).
\section{Extensions and Applications}
\subsection{Multiple Historical Data Sets}
Multiple historical data sets are often available
in clinical
trials settings, carcinogenicity studies, and studies
in which data are collected over time, such as meteorological data.
The priors developed in Section 2 can be
easily extended to more than one historical
study. If there are $N$ historical studies,
we define $D_{0j} = (n_{0j}, X_{0j}, y_{0j})$ to be the
historical data based on the $jth$ study, $j = 1, \ldots, N$,
and $D_0 = (D_{01}, \ldots, D_{0N})$.
In this case, it may be desirable to define a weight
parameter $a_{0j}$ for each historical study, and take
the $a_{0j}$'s to be $i.i.d.$ beta random variables
with hyperparameters $(\delta_0, \lambda_0)$, $j = 1, \ldots, N$. Letting
$a_0 = (a_{01}, \ldots, a_{0N})$,
the prior can be generalized as
\begin{align}
& \pi(\beta, a_0|D_0 ) \\
\propto & \prod_{j=1}^{N}
\left[L(\beta \mid D_{0j})\right]^{a_{0j}}
\ a_{0j}^{\delta_0-1} \ (1-a_{0j})^{\lambda_0-1} .
\label{multiple}
\end{align}
\subsection{Other Models}
The ideas of the power prior can be extended to several
other types of models including proportional hazards models
(Ibrahim, Chen, and MacEachern, 1996), generalized linear
mixed models (Chen, Ibrahim, and Weiss, 1997), and
Time series models (Ibrahim, Chen, and Ryan, 1997).
The basic idea in the construction
of the power prior remains the same in all of these settings.
The general notion of the power prior is that it is a likelihood function
raised to a positive power, where the power is between 0 and 1. The power
prior
may be viewed as a weighted likelihood,
where the weight parameter
is actually a precision parameter.
The conditions for propriety, however, may change slightly
from model
to model, and thus we cannot present a unified theory for
general parametric regression models here.
\section{Conclusions}
We have discussed a very useful class of informative
prior distributions for generalized linear
models, called the power prior distributions.
These priors provide a practical and coherent way to
do Bayesian data analysis
when historical data are available. There
are many applications of the proposed priors, including
model selection, hypothesis testing, carcinogenicity studies, and
clinical trials. The priors
are especially attractive since
few prior parameters have to be elicited.
Such a property is quite desirable for example,
in variable subset selection.
\vspace*{0.3in}
\noindent{\Large \bf References}
\begin{description}
\item Abelson, P.H. (1995), ``Flaws in Risk Assessments",
{\em Science, 270}, 215.
\item Bailer, J.A., and Portier, C.J. (1988),
``Effects of Treatment-induced Mortality and Tumor-induced Mortality
on Tests for Carcinogenicity in Small Samples," {\em Biometrics,
44}, 417-431.
\item Bedrick, E.J., Christensen, R., and Johnson, W. (1996),
``A New Perspective on Priors for
Generalized Linear Models." {\em Journal of the American Statistical Association},
{\em 91}, 1450-1460.
\item Berger, J.O. (1985), {\it Statistical Decision Theory and Bayesian Analysis},
Second Edition, New York: Springer-Verlag.
\item Berger, J.O., and Mallows, C.L. (1988), Discussion of
``Bayesian Variable Selection in Linear Regression,"
{\em Journal of the American Statistical Association, 83}, 1033-1034.
\end{description}
\end{document}