Sent to you by Jeffye via Google Reader:
Bayesian Inference is Based on Probability Models
Bayesian models provide full probability distributions over both observable data and unobservable model parameters. Bayesian statistical inference is carried out using standard probability theory.
What’s a Prior?
The full Bayesian probability model includes the unobserved parameters. The marginal distribution over parameters is known as the “prior” parameter distribution, as it may be computed without reference to observable data. The conditional distribution over parameters given observed data is known as the “posterior” parameter distribution.
Non-Bayesian Statistics
Non-Bayesian statisticians eschew probability models of unobservable model parameters. Without such models, non-Bayesians cannot perform probabilistic inferences available to Bayesians, such as definining the probability that a model parameter (such as the mean height of an adult male American) is in a defined range say (say 5′6″ to 6′0″).
Instead of modeling the posterior probabilities of parameters, non-Bayesians perform hypothesis testing and compute confidence intervals, the subtleties of interpretation of which have confused introductory statistics students for decades.
Bayesian Technical Apparatus
The sampling distribution models the probability of observable data
given unobservable model parameters
.
The prior distribution models the probability of the parameters
.
The full joint distribution over parameters and data is computed with the chain rule, .
The posterior distribution of the parameters
given the observed data
is derived from the sampling and prior distributions via Bayes’s rule,
The posterior predictive distribution for new data
given observed data
is the average of the sampling distribution over parameters proportional to their posterior probability,
The key feature is the incorporation into predictive inference of the uncertainty in the posterior parameter estimate. In particular, the posterior is an overdispersed variant of the sampling distribution. The extra dispersion arises by integrating over the posterior.
Conjugate Priors
Conjugate priors, where the prior and posterior are drawn from the same family of distributions, are convenient but not necessary. For instance, if the sampling distribution is binomial, a beta-distributed prior leads to a beta-distributed posterior. With a beta posterior and binomial sampling distribuiton, the predictive posterior distribution is beta-binomial, the overdispersed form of the binomial. If the sampling distribution is Poisson, a gamma-distributed prior leads to a gamma-distributed posterior; the predictive posterior distribution is negative-binomial, the overdispersed form of the Poisson.
Point Estimate Approximations
An approximate alternative to full Bayesian inference uses for prediction, where
is a point estimate.
The maximum of the posterior distribution provides the-so called maximum a posteriori (MAP) estimate,
If the prior is uniform, the MAP estimate is called the maximum likelihood estimate (MLE), because it maximizes the likelihood of the data . The MLE is popular among non-Bayesian statisticians because the prior may be dropped from the optimization because it only contributes a constant factor.
By definition, the unbiased estimator for the parameter is the expected value of the posterior,
Point estimates may be reasonably accurate if the posterior has low variance. If the posterior is diffuse, prediction with point estimates tends to be underdispersed, in the sense of underestimating the variance of the predictive distribution. This is a kind of overfitting which, unlike the usual situation of overfitting due to model complexity, arises from the oversimplification of the variance component of the predictive model.
Things you can do from here:
- Subscribe to LingPipe Blog using Google Reader
- Get started using Google Reader to easily keep up with all your favorite sites
