commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [math] API changes for RC2
Date Sun, 26 Sep 2004 15:23:51 GMT
Mark R. Diggory wrote:

>> 1) Eliminate the univariate/multivariate distinction in the stat 
>> package, because this seems confusing to some.  Change .univariate to 
>> .descriptive and .multivariate to .regression
> 
> Univariate and Multivariate are just "classifications". There is no 
> suggestion of changing the structure of the packages. Perhaps we can 
> begin building a "classification outline" now so that we have a better 
> idea what are the classes of statistics and what we want our naming 
> scheme to be based on. In the past I've always leaned towards a 
> classification similar to the mathworld site.

Unfortunately, classification != hierarchical decomposition.  The latter 
has got to be tree with no overlap. This is like the LDAP DIT design 
problem -- unless you have a *very* immutable world with very natural 
boundaries, you are likely better off sticking to a relatively flat 
structure. This is why I am now leaning toward .descriptive (fits 
everything in there) and .regression. While they are not OO, SAS and R/S 
both present very flat "package" structures and I don't have that much 
trouble finding things in them.
> 
> The idea of moving SimpleRegression to a package called "regression" is 
> a means to classify "regressions" as much as to classify "multivariates" 
> or "univariates".
> 
> o.a.c.math.stat.regression.SimpleRegression
Yes.
> o.a.c.math.stat.univariate.DescripiveStatistics
No.  Drop the "univariate"
> o.a.c.math.stat.multivariate...
No.  Will eventually have things like
o.a.c.math.stat.cluster

> 
> Kim made a critique about the naming. Yet package names have little to 
> do with the performance of the library. A simple package rename for 
> clarification prior to release is ok with me as long as it "is clarifying".

The point is that we do not want our users to have to experience the pain 
associated with changing package structure later. I agree that we need to 
get this right and I may not be thinking about this correctly, so I will 
wait to make these changes until we all agree.
> 
>> 2) Add methods to create row or column matrices from double arrays and 
>> to extract submatrices (to the interface itself, rather than adding 
>> these to a utils class later)
>>
> 
> Yes, abstracting the passing the reference to a row, column or submatrix 
> to an interface provides us a means to generically perform operations on 
> the matrix independent of the primitive double[] type which cannot be 
> customized or extended. By passing the interface and not the array 
> itself we can actually hand around "references" to the original matrix 
> instead of copies of it. This will be much more efficient for large 
> matrices and allow us as well to implement the same methods on sparse 
> matrix implementations which may not actually be stored in an [][] 
> structure.
If I understand you correctly, what you are suggesting above is to create 
*references* to submatrices based on the same underlying data as the 
"parent" rather than making copies. If we do this, we should implement the 
"copy semantics" as well and carefully document what is going on in each 
case (similar to the setData and setDataRef stuff now -- one set makes 
copies, one does not) The "reference" versions really break encapsulation 
and can lead to nasty bugs.  I understand; however, that for large 
matrices limiting copy operations is necessary.  I still think; however, 
that all of this would be better placed in a MatrixUtils class and this 
could be added in 1.1 with no loss. These are new feature requests that 
came in after RC1 was cut and they can be accomodated in 1.1 without 
breaking backward compatability. I see no reason to hold the release for this.
> 
> [+1]
> 
>> 3) Make the PRNG fully pluggable in the random package.
> 
> 
> I think the challenge we end up with here is to simply provide an 
> interface and base implementation that uses the JVM PRNG,

Well, that is what we have done. RandomDataImpl is the implementation of 
the RandomData interface that uses the JVM PRNG.

> if a user 
> wishes to override the PRNG they simple just implement the interface and 
> pass the implementation into the class that uses the PRNG. We can also 
> provide a separate driver implementation based on RngPack and package 
> that separately as well. If users wish to change the PRNG then they can 
> pickup the RngPack distro and our driver for it.

What we need to do here, if we want to get this done correctly before 1.0 
is design a "RandomSource" or "RandomGenerator" interface. 
Unforturnatlely, java.util.Random is not an interface and what we need is 
to abstract an appropriate interface that will represent this and any 
other PRNG (or RNG) that users may want to plug in. This will be tricky 
and will require some research and discussion.  We can do this now; but it 
will take some time. I would prefer to move forward with the release, 
adding a factory to produce RandomData impls, including a "PRNG-pluggable" 
version of RandomDataImpl in 1.1.

> 
> I felt I could live with these issues unresolved for release 1.0 as 
> well. Yet it sounded like others did not find it satisfactory. I'm 
> willing to work on those I voted [+1] on (Matrix Methods, and PRNG 
> Plugability) to get the packages more satisfactory. 

> I think we should 
> just implement the Variants of Variance and StandardDeviation as 
> separate classes

If you think these absolutely must be in 1.0, go ahead and add the 
classes, tests and docs and I will hold RC2 until they are in. Personally, 
I see no reason that we need to hold the release for these additional 
features.

Phil




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message