commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Chou <hotfusion...@yahoo.com>
Subject Re: [math] proposed ordering for task list, scope of initial release
Date Sun, 08 Jun 2003 06:08:46 GMT
--- Phil Steitz <phil@steitz.com> wrote:
> >>* Improve numerical accuracy of Univariate and BivariateRegression
> >>statistical
> >>computations. Encapsulate basic double[] |-> double mean, variance, min,
> max
> >>computations using improved formulas and add these to MathUtils. (probably
> >>should add float[], int[], long[] versions as well.) Then refactor all
> >>univariate implementations that use stored values (including UnivariateImpl
> >>with finite window) to use the improved versions. -- Mark?  I am chasing
> down
> >>the TAS reference to document the source of the _NR_ formula, which I will
> >>add
> >>to the docs if someone else does the implementation.
> > 
> > 
> > I was starting to code the updating (storage-less) variance formula, based
> on
> > the Stanford article you cited, as a patch.  I believe the storage-using
> > corrected two-pass algorithm is pretty trivial to code once we feel we're
> on
> > solid ground with the reference to cite.
> > 
> > 
> OK. I finally got hold of the American Statistician article (had to 
> resort to the old trundle down to local university library method) and 

Great!  Thanks.


> found lots of good stuff in it -- including a reference to Hanson's 
> recursive formula (from Stanford paper) and some empirical and 
> theoretical results confirming that NR 14.1.8 is about the best that you 
> can do for the stored case.  There is a refinement mentioned in which 
> "pairwise summation" is used (essentially splitting the sample in two 
> and computing the recursive sums in parallel); but the value of this 

I was wondering what the pairwise method was, and whether it was another name
for a technique we'd already discussed.  Sounds sort of like Shell's sort or
other recursive divide-and-occur algorithms.


> only kicks in for large n.  I propose that we use NR 14.1.8 as is for 
> all stored computations.  Here is good text for the reference:
>
> Based on the <i>corrected two-pass algorithm</i> for computing the 
> sample variance, as described in "Algorithms for Computing the Sample 
> Variance: Analysis and Recommendations",Tony F Chan, Gene H. Golub and 
> Randall J. LeVeque, <i>The American Statitistician</i>, 1983, Vol 37, 
> No. 3. (Eq. (1.7) on page 243.)
> 
> The empirical investigation that the authors do uses the following trick 
> that I have thought about using to investigate the precision in our 
> stuff:  implement an algorithm using both floats and doubles and use the 
> double computations to assess stability of the algorithm implemented 
> using floats. Might want to play with this a little.

Yes, I skimmed part of the Stanford article and noticed that test technique. 
It's interesting, and as you say, we may want to experiment with it to see what
it can tell us.



Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message