>
> 0; question: do people really use the population version? When dealing with
> realworld data or any other distribution that is not known a priori, how could
> you, in good conscience? And if you know the distribution a priori, why would
> you need to compute statistics about it?
>
There are three uses that I know of, two legitimate and one bogus, IMHO
(h="haughty" ;)
1) When the population mean is known, but the variance is not and the data
consist of a random sample from the population. In this case, the
"population" forumla will produce a statistic that is an unbiased
estimator for the population variance.
2) When the data *are* the population, so the relevant distribution is
discrete and the formula for the "population" version gives the (exact)
variance of that (discrete) distribution. In this case, what is computed
is not an statistic in the formal sense (by some people's definition), but
a population parameter.
3) (bogus) When for some reason a biased estimate for the population
variance is desired (for compatability with other packages or other reasons).
In addition to the previously cited
http://mathworld.wolfram.com/Variance.html
the population vs. sample distinction is covered fairly well here
http://en.wikipedia.org/wiki/Variance
The relation between statistics, estimators and population parameters is
explained here:
http://en.wikipedia.org/wiki/Estimator
Phil

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
