commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: [math] log representaion of sums was:Re: [math] Priorities, help needed
Date Sat, 24 May 2003 16:02:19 GMT
Phil Steitz wrote:
> Brent Worden wrote:
>> Agreed.  I would like to add that I think we're a little overly concerned
>> about the actual implementation of the algorithm.  In these early 
>> stages of
>> the project, I think it's wiser to spend time discussing the evolving 
>> design
>> and API.  In the end, that is how people will judge the value of this
>> project.  People will care far less about how rock-solid the geometric 
>> mean
>> algorithm is compared to how many features does it provide and how 
>> easy is
>> it to use.
> 
> 
> I could not agree more.  I have been using (and sharing) the original, 
> no-storage, no-rolling version of Univariate for a couple of years now 
> and have found it to be simple, lightweight and easy to use.  That is 
> why I contributed it.  The only thing that I think we really need to 
> worry about as we get the initial release together is that we carefully 
> document the interfaces and the contracts -- otherwise the stuff will 
> not be usable -- and maintain implementation quality.  We should try to 
> avoid stupid things and really bad numerical algorithms, but I agree 
> that our focus should be on getting basic, easy to use, frequently 
> demanded functionality into the package.  Regarding Univariate in 
> particular, my feeling is that the most important things to get in there 
> are percentiles and confidence intervals.  These are what people 
> actually use (beyond the arithmetic mean and variance).
> 
> Have you looked at the task list here:
> http://jakarta.apache.org/commons/sandbox/math/tasks.html?
> 
> Do you have a) comments on these / alternative suggestions  b) code to 
> contribute or c) time to spend helping with implementation?

I'm concerned its starting to get difficult to see such clear interfaces 
with all the code piled up in one package. Refactoring is relatively 
easy at this stage. I want to suggest we begin to isolate different 
functionalities in separate packages for clarity's sake.

One possibility is:

*org.apache.commons.math.random*

EmpiricalDistribution
EmpiricalDistributionImpl
RandomData
RandomDataImpl
ValueServer

*org.apache.commons.math.la*

RealMatrix
RealMatrixImpl

*org.apache.commons.math.util*

ContractableDoubleArray
ExpandableDoubleArray
FixedDoubleArray
DoubleArray

*org.apache.commons.math.stat*

TestStatistic
TestStatisticImpl
Freq
Univariate
UnivariateImpl
ListUnivariateImpl
AbstractStoreUnivariate
StoreUnivariate


The idea being similar in nature to the SAX or DOM api's. Maybe we can 
establish a set of interfaces/factories for these implementations. Maybe 
there are questions about having the "Impl" vs having a factory approach 
to object instantiation. I'm not sure that there would be enough 
"Implementations" to support a API/spec with Factory based instantiation.

I do have some concerns about the Random library and Random Number 
Generation/Distributions.

1.) the JDK provides for "plugability" behind their Random number 
generatator. So you can plug different implmentations in behind it, 
ideally this should be taken advantage of in terms of providing 
different methods of random number generation. This is probibly one 
limitation the CERN random generation libraries.

2.) The Distribution library at CERN has a somewhat successfull layout, 
but I have some problems with it in terms of not being very "Bean like". 
parameters often lack getters/setters that are easy to access via a 
beanlike interface.

http://hoschek.home.cern.ch/hoschek/colt/V1.0.3/doc/cern/jet/random/package-summary.html


Finally, I feel a little wierd about replicating alot of the 
functionality of the CERN library given that it is in production still. 
Its stupid to overlook the efforts Wolfgang Hoschek has placed into 
building a solid LGPL'ed open source mathematics library. I fear in some 
ways we will only end up "replicating" his and others efforts here. I 
wonder if Hoschek would have any interest in "standardization" of his 
packages. Apache could work in his favor if he were interested in 
allowing his code base to be further maintained and developed here. 
Inviting community participation would open the code up to further 
development, enhancement and refactoring to improve the libraries 
infrustructure and save the replication of development. Maybe we should 
consider contacting him at CERN and get his opinion on such an idea.

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message