Well, I think should step back and ask a few design questions concerning
the objects that will use these Sample/Population variances and that
will assist us in their own design.
1.) Is it the case that a covariance matrix could be built off of
"either" Sample or Population Variances?
2.) Are there other applications of Sample/Pop Variances which we want
to implement, if so what are they? Are they interchangeable in these cases?
3.) Do we want to add methods to the Descriptive/Summary/StatUtils stats
to capture both cases?
What this and the Remedian case are somewhat convincing me of is that,
in the SummaryStatistics case; you need to know what your want before
you start adding values to the Statistic, which constitutes a sort of
configuration environment, while in the "DescriptiveStatistics" case,
one can choose these aspects afterward, as the statistic is calculated
after all the values are known.
This means that you either have to calculate both the PopulationVariance
and SampleVariance in the SummaryStatistics case, or configure it to use
one or the other. While in the DescriptiveStatistics case, you can just
call the appropriate method to return that statistic.
Mark
> Hi MArk,
>
> I think we have to think very carefully about this.
> Especially when we start including covariances. My old
> textbooks give the formula as population estimates,
> just like excell (no choice, only population).
> However, covariance matrices include the sample
> covariances....
>
> Cheers,
>
> Kim
Phil Steitz wrote:
> Mark R. Diggory wrote:
>
>> Yes, at the UnivariateStatistic level, these would need to be new
>> classes. My question as well is "Does it apply as well to higher order
>> moments?"
>
>
> In theory, yes, though I have never seen nonbiascorrected versions of
> Skewness and Kurtosis used. The current formulas are all defined for
> the most common use case where the data represent a sample from a
> population whose true distribution and associated parameters are
> unknown.population The formulas that we use provide unbiased estimators
> for population parameters in this case. This is explained fairly well
> for the Variance here:
> http://mathworld.wolfram.com/Variance.html
> and for Skewness and Kurtosis here:
> http://mathworld.wolfram.com/kStatistic.html
>
> The "Population Variance" is useful when the data *are* the population
> (i.e. the distribution is discrete and there is no sampling going on).
> I am not aware of use cases where Skewness and Kurtosis are useful in
> analyzing full population data or other uses for the nonbiascorrected
> versions of these. These could exist, I am just not aware of them.
>
>>
>> Maybe we should place everything into the following packages:
>
>
> I don't think we need yet another subpackage.
>
> 
> To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
> For additional commands, email: commonsdevhelp@jakarta.apache.org
>

Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
