commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: [math] API changes for RC2
Date Sun, 26 Sep 2004 20:37:07 GMT


Phil Steitz wrote:
> Mark R. Diggory wrote:
> 
>>> 1) Eliminate the univariate/multivariate distinction in the stat 
>>> package, because this seems confusing to some.  Change .univariate to 
>>> .descriptive and .multivariate to .regression
>>
>>
>> Univariate and Multivariate are just "classifications". There is no 
>> suggestion of changing the structure of the packages. Perhaps we can 
>> begin building a "classification outline" now so that we have a better 
>> idea what are the classes of statistics and what we want our naming 
>> scheme to be based on. In the past I've always leaned towards a 
>> classification similar to the mathworld site.
> 
> 
> Unfortunately, classification != hierarchical decomposition.  The latter 
> has got to be tree with no overlap. This is like the LDAP DIT design 
> problem -- unless you have a *very* immutable world with very natural 
> boundaries, you are likely better off sticking to a relatively flat 
> structure. This is why I am now leaning toward .descriptive (fits 
> everything in there) and .regression. While they are not OO, SAS and R/S 
> both present very flat "package" structures and I don't have that much 
> trouble finding things in them.
> 

Yes, I have no problem with .descriptive and .regression. Your correct 
about the hierarchical decomposition, thats why if we can identify our 
own "policy" for hierarchical decomp. then at least we have a clear 
defense when package renaming is requested.
>>
>> The idea of moving SimpleRegression to a package called "regression" 
>> is a means to classify "regressions" as much as to classify 
>> "multivariates" or "univariates".
>>
>> o.a.c.math.stat.regression.SimpleRegression
> 
> Yes.
> 
>> o.a.c.math.stat.univariate.DescripiveStatistics
> 
> No.  Drop the "univariate"
> 
>> o.a.c.math.stat.multivariate...
> 
> No.  Will eventually have things like
> o.a.c.math.stat.cluster
> 

Fine with me...

>>
>> Kim made a critique about the naming. Yet package names have little to 
>> do with the performance of the library. A simple package rename for 
>> clarification prior to release is ok with me as long as it "is 
>> clarifying".
> 
> 
> The point is that we do not want our users to have to experience the 
> pain associated with changing package structure later. I agree that we 
> need to get this right and I may not be thinking about this correctly, 
> so I will wait to make these changes until we all agree.
> 

true.

>>
>>> 2) Add methods to create row or column matrices from double arrays 
>>> and to extract submatrices (to the interface itself, rather than 
>>> adding these to a utils class later)
>>>
>>
>> Yes, abstracting the passing the reference to a row, column or 
>> submatrix to an interface provides us a means to generically perform 
>> operations on the matrix independent of the primitive double[] type 
>> which cannot be customized or extended. By passing the interface and 
>> not the array itself we can actually hand around "references" to the 
>> original matrix instead of copies of it. This will be much more 
>> efficient for large matrices and allow us as well to implement the 
>> same methods on sparse matrix implementations which may not actually 
>> be stored in an [][] structure.
> 
> If I understand you correctly, what you are suggesting above is to 
> create *references* to submatrices based on the same underlying data as 
> the "parent" rather than making copies. If we do this, we should 
> implement the "copy semantics" as well and carefully document what is 
> going on in each case (similar to the setData and setDataRef stuff now 
> -- one set makes copies, one does not) The "reference" versions really 
> break encapsulation and can lead to nasty bugs.  

Think of it more as "Iterators" or "Enumerations", These classes provide 
the same sort of functionality for the Collections API. Yes, there are 
"copy semantics" and "concurrent modification rules" that need to be 
employed the same as Collections. As well, I would suggest the idea that 
a Matrix be "final" and "Immutable" in the same way as any 
java.lang.Number implementation, and if not possible in all cases, then 
the mutability should possibly be done internally in the implementation.

I understand; however,
> that for large matrices limiting copy operations is necessary.  I still 
> think; however, that all of this would be better placed in a MatrixUtils 
> class and this could be added in 1.1 with no loss. These are new feature 
> requests that came in after RC1 was cut and they can be accomodated in 
> 1.1 without breaking backward compatability. I see no reason to hold the 
> release for this.


> 
>>
>> [+1]
>>
>>> 3) Make the PRNG fully pluggable in the random package.
>>
>>
>>
>> I think the challenge we end up with here is to simply provide an 
>> interface and base implementation that uses the JVM PRNG,
> 
> 
> Well, that is what we have done. RandomDataImpl is the implementation of 
> the RandomData interface that uses the JVM PRNG.

> 
>> if a user wishes to override the PRNG they simple just implement the 
>> interface and pass the implementation into the class that uses the 
>> PRNG. We can also provide a separate driver implementation based on 
>> RngPack and package that separately as well. If users wish to change 
>> the PRNG then they can pickup the RngPack distro and our driver for it.
> 
> 
> What we need to do here, if we want to get this done correctly before 
> 1.0 is design a "RandomSource" or "RandomGenerator" interface. 
> Unforturnatlely, java.util.Random is not an interface and what we need 
> is to abstract an appropriate interface that will represent this and any 
> other PRNG (or RNG) that users may want to plug in. This will be tricky 
> and will require some research and discussion.  We can do this now; but 
> it will take some time. I would prefer to move forward with the release, 
> adding a factory to produce RandomData impls, including a 
> "PRNG-pluggable" version of RandomDataImpl in 1.1.
> 

This would be the benefit of using RngPack for all Random Number 
generation, its API already supports this and has a RandomElement 
interface with various Implementations including one that wraps 
java.util.Random.

We can either work to integrate RngPack directly into Commons Math, or 
put RngPack on ibiblio and make Commons Math dependent on it. This would 
mean we include the RngPack jars into the Commons Math distributions. 
This should not be an issue because the package has been relicensed 
under a BSD style license that is completable with Apache's.

>>
>> I felt I could live with these issues unresolved for release 1.0 as 
>> well. Yet it sounded like others did not find it satisfactory. I'm 
>> willing to work on those I voted [+1] on (Matrix Methods, and PRNG 
>> Plugability) to get the packages more satisfactory. 
> 
> 
>> I think we should just implement the Variants of Variance and 
>> StandardDeviation as separate classes
> 
> 
> If you think these absolutely must be in 1.0, go ahead and add the 
> classes, tests and docs and I will hold RC2 until they are in. 
> Personally, I see no reason that we need to hold the release for these 
> additional features.
> 

I'm not going to sweat if it gets into 1.0 or 1.1, I'd rather have some 
time for us to work out the details of the implementation. This can be 
post release.

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message