commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [math] API changes for RC2
Date Sun, 26 Sep 2004 22:50:26 GMT

> 
> -0; question:  do people really use the population version?  When dealing with
> real-world data or any other distribution that is not known a priori, how could
> you, in good conscience?  And if you know the distribution a priori, why would
> you need to compute statistics about it?
> 

There are three uses that I know of, two legitimate and one bogus, IMHO 
(h="haughty" ;-)

1) When the population mean is known, but the variance is not and the data 
consist of a random sample from the population. In this case, the 
"population" forumla will produce a statistic that is an unbiased 
estimator for the population variance.

2) When the data *are* the population, so the relevant distribution is 
discrete and the formula for the "population" version gives the (exact) 
variance of that (discrete) distribution. In this case, what is computed 
is not an statistic in the formal sense (by some people's definition), but 
a population parameter.

3) (bogus) When for some reason a biased estimate for the population 
variance is desired (for compatability with other packages or other reasons).

In addition to the previously cited
http://mathworld.wolfram.com/Variance.html

the population vs. sample distinction is covered fairly well here
http://en.wikipedia.org/wiki/Variance

The relation between statistics, estimators and population parameters is 
explained here:
http://en.wikipedia.org/wiki/Estimator

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message