commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <>
Subject Re: [math] Only sample variances?
Date Mon, 16 Aug 2004 14:12:31 GMT
Well, I think should step back and ask a few design questions concerning 
the objects that will use these Sample/Population variances and that 
will assist us in their own design.

1.) Is it the case that a covariance matrix could be built off of 
"either" Sample or Population Variances?

2.) Are there other applications of Sample/Pop Variances which we want 
to implement, if so what are they? Are they interchangeable in these cases?

3.) Do we want to add methods to the Descriptive/Summary/StatUtils stats 
to capture both cases?

What this and the Remedian case are somewhat convincing me of is that, 
in the SummaryStatistics case; you need to know what your want before 
you start adding values to the Statistic, which constitutes a sort of 
configuration environment, while in the "DescriptiveStatistics" case, 
one can choose these aspects afterward, as the statistic is calculated 
after all the values are known.

This means that you either have to calculate both the PopulationVariance 
and SampleVariance in the SummaryStatistics case, or configure it to use 
one or the other. While in the DescriptiveStatistics case, you can just 
call the appropriate method to return that statistic.


> Hi MArk,
> I think we have to think very carefully about this.
> Especially when we start including covariances. My old
> textbooks give the formula as population estimates,
> just like excell (no choice, only population).
> However, covariance matrices include the sample
> covariances....
> Cheers,
> Kim

Phil Steitz wrote:
> Mark R. Diggory wrote:
>> Yes, at the UnivariateStatistic level, these would need to be new 
>> classes. My question as well is "Does it apply as well to higher order 
>> moments?"
> In theory, yes, though I have never seen non-bias-corrected versions of 
> Skewness and Kurtosis used.  The current formulas are all defined for 
> the most common use case where the data represent a sample from a 
> population whose true distribution and associated parameters are 
> unknown.population The formulas that we use provide unbiased estimators 
> for population parameters in this case.  This is explained fairly well 
> for the Variance here:
> and for Skewness and Kurtosis here:
> The "Population Variance" is useful when the data *are* the population 
> (i.e. the distribution is discrete and there is no sampling going on).  
> I am not aware of use cases where Skewness and Kurtosis are useful in 
> analyzing full population data or other uses for the non-bias-corrected 
> versions of these.  These could exist, I am just not aware of them.
>> Maybe we should place everything into the following packages:
> I don't think we need yet another subpackage.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Mark Diggory
Software Developer
Harvard MIT Data Center

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message