commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] API changes for RC2
Date Sun, 26 Sep 2004 00:41:07 GMT
Henri Yandell wrote:
> Some thoughts from a general Commons perspective:

> 1) +0, This is worth getting right early on as package renaming does
> tend to confuse users and takes a long time to work through
> deprecation etc.

Ugh.. Lots of change incompatible with RC1 here...but if others agree...
What I dont understand is why the concept of univariate vs. multivariate 
is hard to understand.  Univariate - sample consists of a single array of 
data (one random variable / distribution); multivariate - sample has more 
than on column (random vector / joint distribution).

> 2)  -0. This is a new feature. If it's easy, add it. If it involves
> effort, don't bother until 1.1.
> 3)  -0. Two options leap to mind, release PRNG as it is, or don't
> release the PRNG code. Yes it's a pain for users when you change the
> functionality in a new version, but when the option is not having a
> feature, users opt for the functionality and pain later. Most likely
> the change would be a simple perl regexp anyway.

A point of clarification here.  There is no PRNG code in [math].  The 
random package includes random data generation methods that *use* the JDK 
PRNG to generate random data, permutations or samples. I strongly disagree 
with the assertion that the JDK Random (and SecureRandom) implementations 
are worthless to the point where this package (which fully documents what 
PRNG is used) is not worth releasing.  The valid issue here is that the 
PRNG should be pluggable (currently you have to subclass RandomDataImpl to 
do this). There also appears to be interest -- independently -- in adding 
other PRNG implementations, which could be among those plugged in to the 
random package.

> 4)  -0. I was never a statistician, but this sounds like new
> functionality. Either release the code as is, or drop it. While 3) is
> an API change, this sounds like a functional change and those are much
> more painful for a user.

I disagree strongly with this change, for two reasons: first, we spent a 
long time dabating whether or not statistics should be implemented as 
separate classes and decided in favor of this.  To have a single class 
compute multiple statistics would be inconsistent with the design of the 
package.  Secondly, even though the "population" version is 
computationally close to the "sample" version, there is an important and 
fundamental difference between them conceptually. I tried to explain this 
in earlier posts. Moreover, it is trivial to add additional classes 
implementing the "population" versions -- which actually supports the 
current one statistic per class design. I have not added them because I 
did not see this as essential for 1.0 and frankly I am not sure they 
belong in .statistics, since statistics are usually associated with sample 
data (i.e., some people would not call the population versions 
"statistics" but rather "population parameters").

> 5) -0. Keep it as is. Again, it might mean an API change in the
> future, but I doubt anyone knows the perfect solution so let's see how
> this one goes.
> So the only one I'd advise as really being worth the effort is 1. 
> Hen


>>>The following changes have been suggested recently.  Before cutting 1.0
>>>final, we should make sure we are all OK postponing or forgoing these:
>>>1) Eliminate the univariate/multivariate distinction in the stat
>>>package, because this seems confusing to some.  Change .univariate to
>>>.descriptive and .multivariate to .regression
>>>2) Add methods to create row or column matrices from double arrays and
>>>to extract submatrices (to the interface itself, rather than adding
>>>these to a utils class later)
>>>3) Make the PRNG fully pluggable in the random package.
>>>4) Modify Variance and StandardDeviation to compute multiple statistics
>>>(with the variants being population, rather than sample statistics).
>>>5) Drop the interface / implementation separation throughout the package.
>>>I am personally -1 on 4) and 5); -0 on 1) and 2); and +0 on 3). I voted
>>>+1 on the release; however, which means that 3) is a wart that I am
>>>willing to live with for 1.0.  It can be worked around now and to fix it
>>>correctly will require that we define a PRNG interface and introduce
>>>factories, etc.
>>>Mark, since you voted to reopen API discussion, can you weigh in on
>>>these issues and add any others that you see as show-stoppers?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message