I am working on getting myself set up with Maven, but I wanted to get
this list out to any who might be willing to a) contribute or b) comment
on priorities or direction.
The proposal presents the following initial scope:
* Simple univariate statistics (mean, standard deviation, n,
confidence intervals)
* Frequency distributions
* ttest, chisquare test
* Random numbers from Gaussian, Exponential, Poisson distributions
* Random sampling/resampling
* Bivariate regression, corellation
and mathematical algorithms such as the following:
* Basic Complex Number representation with algebraic operations
* Newton's method for finding roots
* Binomial coefficients
* Exponential growth and decay (set up for financial applications)
* Polynomial Interpolation (curve fitting)
* Basic Matrix representation with algebraic operations
The following items need completion:
* Univariate needs confidence intervals. I would recommend doing this
by first defining a tstatistic in TestStatistic and then using it.
This is very simple. "Nice to haves" (IMHO) for Univariate would be
addition of quantiles (1,5,10,25,50,75,90,95,99) and boostrap
confidence intervals for the versions that
store data and maybe higher order moments (if possible) for
UnivariateImpl. I would prioritize the quantiles (most important) and
tbased confidence intervals over the higher order moments or
bootstrap confidence intervals.
* ttest statistic needs to be added and we should probably add the
capability of actually performing t and chisquare tests at fixed
significance levels (.1, .05, .01, .001). Down the road, numerical
approximation of the t and chisquare distributions could be added to
enable usersupplied significance levels. Also, more tests.
* the RealMatrixImpl class is missing some key method implementations.
The critical thing is inversion. We need to implement a numerically
sound inversion algorithm. This will enable solve() and also
support general linear regression.
The following items have no submitted implementation. I will continue
to submit solutions for these things, but obviously we need more,
better, faster:)
* ComplexNumber interface and implementation. The only tricky thing
here is making division numerically sound and what extended value
topology to adopt. If no one else jumps on this, I will submit a
cleaned up version of what I have, along with some references.
* Bivariate Regression, corellation. This could be done with simple
formulas manipulating arrays and this is probably what we should aim
for in an initial release. Down the road, we should use the
RealMatrixImpl solve() to support general linear regression. I have
an implementation (of simple regression) that I could clean up and
submit; but again, I would be glad to let someone else submit this.
* Binomial coefficients I have an "exact" implementation that is
limited to what can be stored in a long. This should be extended to
use BigIntegers and potentially to support logarithmic
representations.
The following are items for which I do not have full Java code:
* Newton's method for finding roots
* Exponential growth and decay (set up for financial applications)
* Polynomial Interpolation (curve fitting)
* Sampling from Collections (maybe belongs in Collections???)
It would be a good idea for us to agree on priorities. Personally, I
would list things more or less in the order presented above.
Obviously, one more thing that we need help on is documentation. My
personal top priority is to get some basic material submitted for the
maven site. Finally, there is *lots* of cleanup to do in the existing
code and javadoc and more test cases to add (esp. tests for the
"rolling" capability in UnivariateImpl).
Regards,
Phil

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
