commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Chou <>
Subject RE: [math] API changes for RC2
Date Mon, 27 Sep 2004 15:14:35 GMT
--- Phil Steitz <> wrote:
> Kim raises an important point below.  We need a consistent strategy for
> configuring / parameterizing statistics.  Adding new statistics for each
> computational variant is probably not tenable.  Sorry to flip/flop, but I am
> now in favor of solving this problem correctly for Variance / Std Dev in 1.0.
>  Since you can legitimately look at the "population" versions as
> computational variants (sorry I was dense on this before), we should support
> configurability.  I see four ways to do this:
> 1) Add a flag to getResult determining which version to compute (what Kim has
> done, IIUC)
> 2) Add a "bias reduction" parameter to getResult (I think Kim suggested this
> as a possibility earlier)
> 3) Add a getPopulationResult method
> 4) Add a "biasReduction" property (either boolean defaulted to true or
> float/double defaulted to 1)
> The first one violates the "no boolean flags in API" that we agreed to
> before; but that is not written in stone. I like 4) the best from a design
> standpoint, since the statistic still adheres to the basic
> UnivariateStatistic contract in this case; though I can understand that users
> may prefer something like 1) or 2).  Thoughts?
> Sorry to flip/flop, but we need to get this right.
> Phil
> 	-----Original Message----- 
> 	From: Kim van der Linde [] 
> 	Sent: Sun 9/26/2004 7:12 PM 
> 	To: Jakarta Commons Developers List 
> 	In general I can say that the way we use multivariate and all kind of
> 	more complex methods is that we want to be able to determine every part
> 	of the analysis, as the number of data manipulations prior to the
> 	actuall analysis can vary dramatically. And that is why I am so stong in
> 	favour of treating bias and DOF control as an aspect of the statistics,
> 	not as a seperate statistics.
> 	Cheers,
> 	Kim

So someone please lay out a real(istic) use case.  Would you ever make two
successive (or closely separated, anyway) method calls to get both the sample
and population result for the same dataset?  Or do you usually just use one and
not the other?  In pseudo-code, do you ever need to do this:

StandardDeviation sd = new StandardDeviation( ... ) ;
sd.getResult() ;
sd.getPopulationResult() ;

or is it sufficient functionality if you have to say something like:

StandardDeviation sd = new StandardDeviation( ... ) ;
sd.Result() ;
sd.populationResult = true ;
sd.Result() ;

I realize that it's only one extra line of code, but if you're calling these
statistics often, one extra line per invocation could be enough to make the API
unnecessarily awkward.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message