commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <>
Subject Re: [math] API changes for RC2
Date Sun, 26 Sep 2004 15:51:11 GMT

Phil Steitz wrote:
> The following changes have been suggested recently.  Before cutting 1.0 
> final, we should make sure we are all OK postponing or forgoing these:
> 1) Eliminate the univariate/multivariate distinction in the stat 
> package, because this seems confusing to some.  Change .univariate to 
> .descriptive and .multivariate to .regression

Univariate and Multivariate are just "classifications". There is no 
suggestion of changing the structure of the packages. Perhaps we can 
begin building a "classification outline" now so that we have a better 
idea what are the classes of statistics and what we want our naming 
scheme to be based on. In the past I've always leaned towards a 
classification similar to the mathworld site.

The idea of moving SimpleRegression to a package called "regression" is 
a means to classify "regressions" as much as to classify "multivariates" 
or "univariates".


Kim made a critique about the naming. Yet package names have little to 
do with the performance of the library. A simple package rename for 
clarification prior to release is ok with me as long as it "is clarifying".


> 2) Add methods to create row or column matrices from double arrays and 
> to extract submatrices (to the interface itself, rather than adding 
> these to a utils class later)

Yes, abstracting the passing the reference to a row, column or submatrix 
to an interface provides us a means to generically perform operations on 
the matrix independent of the primitive double[] type which cannot be 
customized or extended. By passing the interface and not the array 
itself we can actually hand around "references" to the original matrix 
instead of copies of it. This will be much more efficient for large 
matrices and allow us as well to implement the same methods on sparse 
matrix implementations which may not actually be stored in an [][] 


> 3) Make the PRNG fully pluggable in the random package.

I think the challenge we end up with here is to simply provide an 
interface and base implementation that uses the JVM PRNG, if a user 
wishes to override the PRNG they simple just implement the interface and 
pass the implementation into the class that uses the PRNG. We can also 
provide a separate driver implementation based on RngPack and package 
that separately as well. If users wish to change the PRNG then they can 
pickup the RngPack distro and our driver for it.


> 4) Modify Variance and StandardDeviation to compute multiple statistics 
> (with the variants being population, rather than sample statistics).

Yes the choice is to decide if these are infact "variants" of the same 
statistic or infact separate statistics. I'm not convinced either way at 
this point and can see both approaches not deviating from package design.


> 5) Drop the interface / implementation separation throughout the package.

This sounded more like a complaint about Java itself. The logic behind 
this recommendation was unclear to me and totally destroys any 
extensibility to the API. Interfaces and Implementation as standard to 
Java and necessary for any package to work properly. I might suggest the 
argument was more about "Factories" vs using actual constructors to 
build the objects, which I would see as a more serious argument 
concerning the Packages design.


> I am personally -1 on 4) and 5); -0 on 1) and 2); and +0 on 3). I voted 
> +1 on the release; however, which means that 3) is a wart that I am 
> willing to live with for 1.0.  It can be worked around now and to fix it 
> correctly will require that we define a PRNG interface and introduce 
> factories, etc.
> Mark, since you voted to reopen API discussion, can you weigh in on 
> these issues and add any others that you see as show-stoppers?

I felt I could live with these issues unresolved for release 1.0 as 
well. Yet it sounded like others did not find it satisfactory. I'm 
willing to work on those I voted [+1] on (Matrix Methods, and PRNG 
Plugability) to get the packages more satisfactory. I think we should 
just implement the Variants of Variance and StandardDeviation as 
separate classes and continue any argument concerning what the 
appropriate strategy is for them in the future. I would be interested in 
assisting in this as well.


Mark Diggory
Software Developer
Harvard MIT Data Center

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message