commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <>
Subject Re: [math] proposed ordering for task list, scope of initial release
Date Tue, 10 Jun 2003 13:38:47 GMT
Brent Worden wrote:
>>-----Original Message-----
>>From: Phil Steitz []
>>Sent: Friday, June 06, 2003 12:21 PM
>>Subject: [math] proposed ordering for task list, scope of initial
>>Here is a *proposed* ordering for the task list, with a little commentary
>>One thing that I want to make *very* clear up front, is that I
>>*never* intended
>>the task list or the items listed in the scope section of the
>>proposal to be
>>definitive.  All that is definitive are the guiding principles,
>>which just try
>>to keep us focused on stuff that people will find both useful and
>>easy to use.
>>I expected that the actual contents of the first release would
>>include some
>>things not on the list and would exclude some of the things
>>there.  At this
>>stage, as Jouzas pointed out, it is more important for us to
>>build community
>>than to rush a release out the door. So if there are things that fit the
>>guidelines that others would like to contribute, but which are
>>not on the list,
>>*please* suggest them.  Also, for those who may not have dug into
>>the code, but
>>who may be interested in contributing, please rest assured that deep
>>mathematical knowledge is not required to help. We can review
>>and deal with mathematical problems as they arise, using our
>>small but growing
>>community as a resource.  The same is obviously true on the the
>>Java/OS tools
>>side -- no need to be an expert to contribute.
>>OK, long-winded disclaimer aside, here is how I see the task list ordered:
>>* The RealMatrixImpl class is missing some key method implementations. The
>>critical thing is solution of linear systems. We need to implement a
>>numerically sound solution algorithm. This will enable inverse() and also
>>support general linear regression. -- I think that Brent is
>>working on this.
> The only thing I've done is the Cholesky decomposition.  I haven't done
> anything for the general linear system case.
Are you going to do this, or should I take it on?
>>* t-test statistic needs to be added and we should probably add
>>the capability
>>of actually performing t- and chi-square tests at fixed
>>significance levels
>>(.1, .05, .01, .001). -- This is virtually done, just need to
>>define a nice,
>>convenient interface for doing one- and two-tailed tests.  Thanks
>>to Brent, we
>>can actually support user-supplied significance levels (next item)
> Anyone have any thoughts on the interface?  I was thinking of an Inference
> interface that supports the conducting of one- and two-tailed tests as well
> as constructing their complementary confidence intervals.  Or, if we want to
> separate concerns create both a HypothesisTest and a ConfidenceInterval
> interface, one for each type of inference.  Either way, I would use the
> tried-and-true abstract factory way of creating inference instances.
> Comments are welcome.
>>* numerical approximation of the t- and chi-square distributions to enable
>>user-supplied significance levels.  See above.  Someone just
>>needs to put a
>>fork in this. Tim? Brent?
> Done.
Including the testing interface?  See below.

>>* *new* add support for F distribution and F test, so that we can report
>>signinficance level of correlation coefficient in bivariate regression /
>>signinficance of model.  I will do this if no one else wants to.
> Done.  I'll probably knock out a few more easy continuous distributions to
> get them out of the way.
>>* Framework and implementation strategie(s) for finding roots or
>>functions of one (real) variable.  Here again -- largely done.  I
>>would prefer
>>to wait until J gets back and let him submit his framework and R. Brent's
>>algorithm.  Then "our" Brent's implementation and usage can be integrated
>>(actually not much to do, from the looks of the current code) and
>>I will add my
>>"bean equations" stuff (in progress).
> Sounds good.
>>* Extend distribution framework to support discrete distributions
>>and implement
>>binomial and hypergeometric distributions.  I will do this if no
>>one else wants
>>to.  If someone else does it, you should make sure to use the log
>>binomials in
> Binomial can easily be obtained using the regularized beta function that is
> already defined.  Hypergeometric will be a little more work as I don't think
> there's a compact formula to compute the cpf.

Using the log binomials, direct computation of the density might not be 
too bad.  I have not researched this, but that is what I was thinking.

   One thing to note, since the
> discrete distributions do not have nice invertible mappings for critical
> values to probabilities like those found for continuous distributions, how
> should the inverseCummulativeProbability method work?  For a given
> probability, p, should the method return one value, x, such that x is the
> largest value where P(X <= x) <= p?  Or the smallest value, x, where P(X <=
> x) >= p.  Or should the method return two values, x0 and x1, such that P(X
> <= x0) <= p <= P(X <= x1)?

I think in the discrete case, we should supply the density function (and 
the cumulative probability function) and probably omit the 
inverseCumulativeProbability method.  If we were to add it, I would use 
the second of your alternatives above.

>>* Exponential growth and decay (set up for financial
>>applications) I think this
>>is just going to be a matter of finding the right formulas to add
>>to MathUtils.
>> I don't want to get carried away with financial computations,
>>but some simple,
>>commonly used formulas would be a nice addition to the package.
>>We should also
>>be thinking about other things to add to MathUtils -- religiously
>>adhering to
>>th guiding principles, of course.  Al's sign() is an excellent
>>example of the
>>kind of thing that we should be adding, IMHO.
> Things that might be added:
> Average of two numbers comes up a lot.

Yes. Some (of us) might not like the organization of this; but I have a 
couple of times posted the suggestion that we add several
double[]->double functions to MathUtils representing the core 
computations for univariate -- mean, min, max, variance, sum, sumsq. 
This would be convenient for users and us as well.  I guess I would not 
be averse to moving these to stat.StatUtils, maybe just adding ave(x,y) 
to MathUtils.

Given the post that I just saw regarding financial computations, I 
suggest that we let MathUtils grow a bit (including the double[]->double 
functions and then think about breaking it apart prior to release.  As 
long as we stick to simple static methods, that will not be hard to do.

> Something similar to JUnit's assertEquals(double expected, double actual,
> double epsilon).

Good idea

> Simple methods like isPositive, isNegative, etc. can be used to make boolean
> expressions more human readable.

I agree

> Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc.
> I've used a default error constant several places.

I get the first 3, but what exactly do you mean by the default error 

   It would be nice to come
> up with a central location for such values.
> In addition to the above, has any thought gone into a set of application
> exceptions that will be thrown.  Are we going to rely on Java core
> exceptions or are we going to create some application specific exceptions?
> As I recall J uses a MathException in the solver routines and I added a
> ConvergenceException.  Should we expand that list or fold it into one
> generic application exception or do away with application exceptions all
> together?
My philosophy on this is that whatever exceptions we define should be 
"close" to the components that throw them -- e.g. ConvergenceException. 
  I do not like the idea of a generic "MathException."  As much as 
possible, I think that we should rely on the built-ins (including the 
extensions recently added to lang). Regarding ConvergenceException, I am 
on the fence for inclusion in the initial release, though I see 
something like this as eventually inevitable.  Correct me if I am wrong, 
but the only place that this is used now is in the dist package and we 
could either just throw a RuntimeException directly there or return NaN. 
  I do see the semantic value of ConvergenceException, however.  I guess 
I would vote for keeping it.

> Brent Worden
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message