Thanks for the response Phil.
This question has been on my mind for a month or so. I was working on some
Student T approximators (ergo my check in of the NIST data for Student T),
when I noticed that the commons Student T was based on the continuing
fraction version of the incomplete beta formulation. Then I updated my local
working copy and noticed you have made some changes to the Normal
distribution. I looked at that implementation and it is the one based on the
error function. There is nothing wrong with the choices that were made and I
am sure your error function is awesome. However, my experience has been
that, especially in the tails, you want other approximators (typically some
variation on a power series).
As for parallels, I think you are correct, the best one is the commons
implementation schema for random number generators. As I was looking back,
it struck me as odd that the old SVD was ditched. Again, not saying it did
not deserve it, just wanted to make sure I understood the philosophy behind
the decision making.
Thank you,
-Greg
On Thu, Sep 1, 2011 at 1:11 AM, Phil Steitz wrote:
> On 8/31/11 10:05 PM, Greg Sterijevski wrote:
> > Hello All,
> >
> > This question popped into my head this evening, what is the right way to
> > handle multiple algorithms which purport to calculate the same thing?
> There
> > are, for example, a couple of ways to calculate the student t cdf. What
> is
> > the common's philosophy on deciding:
> >
> > 1. Whether to allow multiple algorithms.
> > 2. How an algorithm is included.
> > a.) Does a 'bug' or shortcoming need to be shown in the current
> > implementation?
> > b.) Say that algorithm a works for a numerical range and b works best
> on
> > another. Are both included? Is a new 'meta' algorithm written which mixes
> > both a and b?
> > 3. Does simplicity count?
> > 4. Does speed matter?
> >
> > A while back, Chris Nix reimplemented the SVD routine. I am not sure I
> > remember the old routine so I cannot say there was anything worth keeping
> > there. However was there a conscious decision to scrap it? Why not have
> it
> > live side by side with the new one? (Again, I am not saying the old
> > algorithm was better-Chris' contribution definitely was valuable). I
> think
> > we will run into these issues often.
> >
> > Thoughts? If this has been discussed already, my apologies.
>
> Well, at a high level, we tried to settle this at the very
> beginning. Have a look at items 3 and 4 in the "guiding principles"
> on the [math] main page :) That stuff comes from the original
> proposal for [math] and we have tried to stay more or less faithful
> to the principles laid out there.
>
> The integration, ode and solvers packages all try to do exactly what
> you are suggesting - when multiple algorithms, or even variations on
> an algorithm, exist and no single one can really do the job for all
> practical use cases, we welcome and incorporate implementations of
> multiple different ones. How we do that varies by package and this
> is one place where our "interface pain" has led to some trauma.
> Initially, we pretty much uniformly separated interfaces from
> implementation precisely for this reason. You can still see that in
> the distributions package and while it is a little ironic since we
> have been talking about collapsing the interfaces into the impls
> there, the setup now supports multiple different implementations of
> any of the distributions.
>
> I don't want to turn this into an abstract discussion of how best to
> support the strategy pattern in [math]. I think the "how to do it"
> part depends on the problem / package. What we have tried to do so
> far and I think we should keep trying to do is:
>
> 0) If there is a single best algorithm that really does work well in
> almost all cases, make that "the obvious thing" and try to make our
> implementation of it as robust as possible. If we provide only one
> implementation, try to make it the best one. I would say put
> SimpleRegression, or Variance, for example, in this category.
>
> 1) When multiple different standard algorithms exist, make sure our
> API supports adding alternatives - including user-supplied
> alternatives - in the least-confusing way we can think of. Welcome
> contributions of the alternatives and try to document as best we can
> how and when to use the different implementations. Try to stick to
> standard algorithms, with good external references, so we don't have
> to turn our javadoc and/or user guide into a numerical analysis
> textbook.
>
> 2) Whenever possible, try to encapsulate the part that varies for
> different implementations, so the API remains simple and the
> variable part can be "plugged in." Good examples of this are the
> RandomGenerators or the Solvers used by different classes.
>
> The general question is a good one; but the specific answer depends
> on the algorithms and classes involved. In the two cases that you
> mention (t distribution cdf and SVD), we have to balance API
> complexity and more code to support against practical value. SVD
> has been a big challenge for us, so I would say we should start at
> step 0) on that one; but I could easily be talked into supporting
> multiple impls given strong numerical arguments and multiple good,
> robust impls. For the t distribution, I am not so sure. A separate
> thread for that is probably best. My intuition there is that like
> the other distributions, we and our users are best off with a single
> impl that does whatever it needs to do in different ranges to
> provide robust estimates, possibly allowing configuration settings
> to trade performance for accuracy.
>
> Phil
>
> >
> > -Greg
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>