 Phil Steitz <steitzp@yahoo.com> wrote:
>  "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:
> > Al Chou wrote:
> > > mdiggory@apache.org wrote:
> > >>mdiggory 2003/06/16 07:29:31
> > >>
> > >> Modified: math/xdocs developers.xml
> > >> math/src/java/org/apache/commons/math/stat
> > >> UnivariateImpl.java
> > >> math/src/test/org/apache/commons/math/stat
> > >> CertifiedDataTest.java
> > >> Log:
> > >> PR: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20782
> > >> Submitted by: HotFusionMan@Yahoo.com
> > >>
> > >> I added this, but there are changes I'd like to make in the near
> future.
> > >>Only the "running" aspects of the variance calc should be in the
> > insertValue
> > >>function, all other calculation should be in the getVariance function.
> > >>
> > >
> > >OK, that sounds reasonable. Also, I was starting an Extract Method
> > refactoring
> > >to reduce duplication among the windowSize != n and infinite window
> branches
> > of
> > >insertValue.
> > >
> > >
> > >Al
> > >
> >
> > Yes, this is the direction I am working on as well. Lets try to
> > coordinate our efforts.
> >
> > (1) I'm looking to setup "true deligation" where UnivariateImpl is
> > actually an extension of AbstractStoreUnivariate and deligates to these
> > methods when doing stored processing. This would simply look like:
> >
> > /**
> > * @see org.apache.commons.math.stat.Univariate#getMean()
> > */
> > public double getMean() {
> > if (windowSize != Univariate.INFINITE_WINDOW) {
> > return super.getMean();
> > }
> >
> > return mean;
> > }
>
> 1 I would prefer for both to delegate to an optimized method in StatUtils.
1 as well
I thought we had discussed avoiding entanglement in the class hierarchy via
delegation. I was surprised to see this delegated implementation committed.
Can we have a design discussion before proceeding? And I mean just freeze,
don't even roll back what's in CVS until we sort out a consensus.
FYI, my Extract Method was simply of the form:
private void updateStatisticsWithNewValue( double v )
{
n += 1 ;
if (v < min) {
min = v;
}
if (v > max) {
max = v;
}
product *= v;
if ( n > 1 )
{
double deviationFromMean = v  mean ;
double deviationFromMean_overN = deviationFromMean / n ;
mean += deviationFromMean_overN ;
pre_variance += (n  1) * deviationFromMean *
deviationFromMean_overN ;
variance = pre_variance / (n  1) ;
}
}
> > (2) I want to apply the same strategy used in your mean and variance
> > calculations for skew and kurt. The getters for these properties then
> > would truely just be "getters" without the calculations occuring in them
> > as well. This means the sum of powers code goes away for now.
> >
> > (3) I want to derive an methodology for the same twopass algorithm for
> > skew and kurt, hey, if we can't find published work on it, then theres a
> > possibly paper in the future for someone to write!
> >
> We should stick with established algorithms. I would suggest researching
> established computational formulas for higher order moments.
+1, especially given the subtleties we learned about in the "simple" case of
variance, I would be extremely loath to implement a new algorithm without
_vigorous_ testing, which we probably don't really want to hold up an initial
release for. Also, as always I am skeptical about the realworld utility of
such highorder moments as skewness and kurtosis. IMO, we are already
providing much more statistical functionality than most programmers understand
how to use correctly.
Al
=====
Albert Davidson Chou
Get answers to Mac questions at http://www.MacMgrs.org/ .
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL  Now only $29.95 per month!
http://sbc.yahoo.com

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
