On 7/11/06, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
> J.Pietschmann wrote :
>
> > Well, the majority of the num math text books on my shelf actually
> > recommend computing the sum of the squared errors instead of the
> > algebraic equivalent form given in the more analytically oriented
> > text books (and used above). This is, of course, more complicated
> > and still prone to adverse numerical effects unless the sequence
> > is also sorted.
>
Can you provide references?
> You are right, but this would also imply storing all values and either
> recompute everything as points are added/removed or set up a "dirty"
> flag to perform lazy evaluation only when needed. This has an impact on
> both memory and CPU usage.
>
> The current implementation does not retain each points, it simply
> handles them on the fly by updating a few running sums. It can handle an
> extremely large number of points with a very little memory footprint.
>
> Do you think we should provide two implementations, one being memory/CPU
> friendly and the other one being accuracyfriendly ?
>
No, unless there are compelling arguments indicating that direct
computation is in fact more accurate for many instances (contradicting
references in the javadoc), in which case we would as you point out
need to maintain two versions, since we can't abandon the scalability
and performance of the current (essentially stateless) impl. See the
references to the Chan / Golub article on accumulating sums of squares
in the addData javadoc and the appliled regression text (Weisberg)
cited there. See also, e.g., Neter and Wasserman, Applied Linear
Statistical Models [isbn 0256117365].
Phil

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
