commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim van der Linde <kimvdli...@yahoo.com>
Subject Re: [math] Only sample variances?
Date Tue, 17 Aug 2004 03:39:27 GMT

--- "Mark R. Diggory" <mdiggory@latte.harvard.edu>
wrote:

> Well, I think should step back and ask a few design
> questions concerning the objects that will use these

> Sample/Population variances and that will assist us 
> in their own design.
> 
> 1.) Is it the case that a covariance matrix could be
> built off of "either" Sample or Population
Variances?

Yes. With the remark that the whole matrix is filled
with either sample (co-)variances or population
(co-)variances.

> 2.) Are there other applications of Sample/Pop
> Variances which we want to implement, if so what are

> they? Are they interchangeable in these cases?
>
> 3.) Do we want to add methods to the
> Descriptive/Summary/StatUtils stats 
> to capture both cases?

I can not answer these two questions. However, I do
know that you can calculate any method that uses
(co-)variances with either population or sample
estimates. So, my suggestion would be to incorperate
it such a way that it deploys a default (my preference
would be sample) but leaves the option open to use
population versions instead, without calling a
complete new class. Essentially, as soon as you go
with the population variance, all derived methods have
to go with that to, including correlations,
regressions pca, GLM etc.
 
> What this and the Remedian case are somewhat
> convincing me of is that, in the SummaryStatistics 
> case; you need to know what your want before you 
> start adding values to the Statistic, which
> constitutes a sort of configuration environment, 
> while in the "DescriptiveStatistics" case, one can 
> choose these aspects afterward, as the statistic is 
> calculated after all the values are known.
>
> This means that you either have to calculate both
> the PopulationVariance and SampleVariance in the 
> SummaryStatistics case, or configure it to use one
or
> the other. While in the DescriptiveStatistics case, 
> you can just call the appropriate method to return 
> that statistic.

If you want to set it very blunt, the only difference
is the N/(N-1) (or reciprocal of that) factor, which
always can be added.

That is also why I think that incorporating is the
best way.

With the median, this might be different.

Cheers,

Kim


		
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message