commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Chou <hotfusion...@yahoo.com>
Subject Re: cvs commit: jakarta-commons-sandbox/math/src/test/org/apache/commons/math/stat
Date Tue, 17 Jun 2003 13:11:56 GMT
--- Phil Steitz <steitzp@yahoo.com> wrote:
> --- "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:
> > Al Chou wrote:
> > >--- mdiggory@apache.org wrote:
> > >>mdiggory    2003/06/16 07:29:31
> > >>
> > >>  Modified:    math/xdocs developers.xml
> > >>               math/src/java/org/apache/commons/math/stat
> > >>                        UnivariateImpl.java
> > >>               math/src/test/org/apache/commons/math/stat
> > >>                        CertifiedDataTest.java
> > >>  Log:
> > >>  PR: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20782
> > >>  Submitted by:	HotFusionMan@Yahoo.com
> > >>  
> > >>  I added this, but there are changes I'd like to make in the near
> future.
> > >>Only the "running" aspects of the variance calc should be in the
> > insertValue
> > >>function, all other calculation should be in the getVariance function.
> > >>    
> > >
> > >OK, that sounds reasonable.  Also, I was starting an Extract Method
> > refactoring
> > >to reduce duplication among the windowSize != n and infinite window
> branches
> > of
> > >insertValue.
> > >
> > >
> > >Al
> > >
> > 
> > Yes, this is the direction I am working on as well. Lets try to 
> > coordinate our efforts.
> > 
> > (1) I'm looking to setup "true deligation" where UnivariateImpl is 
> > actually an extension of AbstractStoreUnivariate and deligates to these 
> > methods when doing stored processing. This would simply look like:
> > 
> >     /**
> >      * @see org.apache.commons.math.stat.Univariate#getMean()
> >      */
> >     public double getMean() {
> >         if (windowSize != Univariate.INFINITE_WINDOW) {
> >           return super.getMean();
> >         }
> >        
> >         return mean;
> >     }
> 
> -1  I would prefer for both to delegate to an optimized method in StatUtils.

-1 as well

I thought we had discussed avoiding entanglement in the class hierarchy via
delegation.  I was surprised to see this delegated implementation committed. 
Can we have a design discussion before proceeding?  And I mean just freeze,
don't even roll back what's in CVS until we sort out a consensus.

FYI, my Extract Method was simply of the form:

    private void updateStatisticsWithNewValue( double v )
    {
        n += 1 ;
        if (v < min) {
            min = v;
        }
        if (v > max) {
            max = v;
        }
        product *= v;

        if ( n > 1 )
        {
            double deviationFromMean = v - mean ;
            double deviationFromMean_overN = deviationFromMean / n ;
            mean += deviationFromMean_overN ;
            pre_variance += (n - 1) * deviationFromMean *
deviationFromMean_overN ;
            variance = pre_variance / (n - 1) ;
        }
    }


> > (2) I want to apply the same strategy used in your mean and variance 
> > calculations for skew and kurt. The getters for these properties then 
> > would truely just be "getters" without the calculations occuring in them 
> > as well. This means the sum of powers code goes away for now.
> > 
> > (3) I want to derive an methodology for the same two-pass algorithm for 
> > skew and kurt, hey, if we can't find published work on it, then theres a 
> > possibly paper in the future for someone to write!
> > 
> We should stick with established algorithms.  I would suggest researching
> established computational formulas for higher order moments.

+1, especially given the subtleties we learned about in the "simple" case of
variance, I would be extremely loath to implement a new algorithm without
_vigorous_ testing, which we probably don't really want to hold up an initial
release for.  Also, as always I am skeptical about the real-world utility of
such high-order moments as skewness and kurtosis.  IMO, we are already
providing much more statistical functionality than most programmers understand
how to use correctly.



Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message