Page 59 - Spring 2015
P. 59

peaked probability density functions encode a high degree of belief in the truth of the parameter values lying at cer- tain (narrow) ranges in the parameter space. Assigning the prior probability in this way implies injection of subjective belief into the data analysis. In that case, one already knows the parameters accurately so there is no need for experi- ment unless the experiment can yield a sufficient quality and quantity of data so that the likelihood function, and hence the posterior probability, become even more sharply peaked with respect to the parameter values.
The use of prior probability is actually a strength, not a weakness, of Bayesian analysis because in the extreme case that we know of, the parameters of the prior probability are all heaped at a single value so that the prior probability is zero elsewhere. Bayes’ theorem immediately shows that this feature carries through to the posterior, in accord with intu- ition but not with frequentist methods of data analysis. The assignment of prior probability should proceed from the prior information. If it is not known how to assign a prior distribution from the prior information and often this is the case where limited prior knowledge on the parameter values is available, then it is safe to assign a broad prior probability density as shown in Figure 3b. Below, the objective Bayesian analysis proceeds by assigning such a noninformative prior probability.
Maximum entropy Prior Probabilities
Bayes’ theorem involves both sources of information: the prior information about the process (parameters) under in- vestigation and the experimental data observed in the exper- iment through the likelihood function. The prior probabil- ity encodes the experimenter’s initial degree of belief in the possible values of parameters, and the likelihood function encodes the degree of belief in the possible fit (or misfit) of the hypothesized prediction generated by the parameters to the experimental data. Both probabilities must be assigned (yet not assumed) to apply Bayes’ theorem.
Berger (2006) rebutted a common criticism of the Bayes- ian school arising from the supposedly subjective use of prior probabilities. Even the first Bayesians, including Bayes (1763) and Laplace (1812), performed probabilistic analy- sis using constant prior probability for unknown parame- ters. A well-known technique that is often used in objective Bayesian analysis relies on whatever is already known about a probability distribution to assign it; this is the maximum entropy method and generates so-called maximum entropy
priors (Jaynes, 2003). Jaynes (1968) applied a continuum ver- sion of the Shannon (information theoretic) entropy, a mea- sure of uncertainty, to encode the available information into a probability assignment. In the objective Bayesian literature (Jaynes, 1968; Gregory, 2010), this is termed the principle of maximum entropy, and it provides a consistent and rigor- ous way to encode testable information into a unique prob- ability distribution. The principle of maximum entropy as- signs a probability, p(.), that maximizes the entropy, S[p(.)], involving distributing the probability as noncommitally as possible while satisfying all known constraints on the dis- tribution. The resulting distribution is also guaranteed to be nonnegative.
Prior Probability Assignment
To assign the prior probability (in Equation 2) objectively, no possible value of a parameter should be privileged over any other, except to the extent necessary to conform to any known constraints on the distribution. A universal con- straint is normalization such that the prior probability den- sity integrates to unity. The principle of maximum entropy, as its name suggests, assigns the density, p(.), by maximizing its entropy, S[p(.)], subject to this constraint and any other. In the absence of further constraints, the result is a constant- value probability density bounded by a certain wide param- eter range, so-called uniform prior probability (Jaynes, 1968; Gregory, 2010).
likelihood function Assignment
The likelihood function represents the probability of the re- sidual errors. In the example shown in Figure 1, these er- rors are essentially the differences between the hypothesized prediction (solid line) and the experimental data (black dots). When assigning the likelihood function, one should incorporate only what is known about the residual errors. In other words, objective Bayesian analysis should not implic- itly commit itself to any information that is not known to the experimenter.
As ever, this probability distribution must integrate to unity. Also, in many data analysis tasks, the experimenters know in advance that the model (such as Equation 1 in the example mentioned above) is capable of representing the data well, so that the residual errors should feature finite values and the variance is formally noninfinite. Taking into account the finite variance in a maximum entropy procedure on a con- tinuous space of uniform measure, the result is the Gauss- ian or normal probability distribution for the residual errors
| 57























































































   57   58   59   60   61