commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <stei...@yahoo.com>
Subject Re: [math] Main Univariate Facade Implementations that work with UnivariateStatistics
Date Tue, 08 Jul 2003 07:28:30 GMT
--- "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:
> Here is a patch with new versions of the Univariate Facades.
> Included in this patch are:
> 
> 1.) one new univariate "MixedListUnivariate" That accepts a 
> TransformerMap to transform objects to primitive doubles.
> 
> 2.) one new AbstractUnivariate implementation.
> 
> 3.) There are many revisions to the current Implementations to make them 
> work with both NumberTransformers and the individual UnivariateStatistics.
> 
> 4.) all Moment based stats (like skew and kurt) are moved up into the 
> Univariate interface.
> 
> 5.) The StorelessUnivariateStatistics have been reorganized to move 
> calculations that do not need to be performed on "increment" further 
> upstream to "getValue". This reduces the amount of calculation being 
> done at the addValue stage (eliminating variance, skew and kurtosis 
> calculations from the moments at this stage).
> 
> 6.) All moment based statistics have been modified to support sharing a 
> common moment. This way internal calculations for m1, m2, m3 and m4 do 
> not need to be replicated within the individual stats, they can all 
> share the same object.
> 
> I would really like to get some input on these from the group as they 
> represent a rather large commit change on others work in the stat directory.
> 
> Lastly, I do have a version of StatUtils that works with 
> UnivariateStatistics, but I'm now convinced that we no longer need 
> StatUtils any more.
 
Given the consensus to move in the direction of disaggregated statistics, I
would agree that there is no internal need for StatUtils.

As a final comment on this, I would like to point out that my opposition to
this approach was based on what I now see was a naive view that we could
actually agree on a set of commonly used univariate statistics and limit our
support to these. I never envisioned Univariate as a "large, monolithic
interface." I see now that this is an inherently limiting perspective and I
should not have proposed it. I was relying too much on my biased practical
experience/observation that once you get past the basic stuff, practical
applications drop off quickly. I was also overly concerned about performance
and overhead, again largely due to my own experience and application needs.

The one thing that I don't understand about the new approach and I would
suggest reconsidering is why you want to retain the Univariate interfaces at
all.  As long as you have these and people depend on them, I don't think that
you will really have the full extensibility that you want and you will have
added complexity and overhead to deal with. Sort of the worst of both worlds.
The only thing that you *need* is a way to aggregate data (actually you have
this already -- just need shared aggregation).  Why not just move to a model
where a Univariate has a dynamic List of Statistics and do away with the getXXX
methods in the Univariate interfaces altogether? 

Phil



__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message