commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: [math] Design review pre 1.0
Date Tue, 25 May 2004 17:16:44 GMT
I want to review this list and add in my comments...

Stephen Colebourne wrote:

>I have performed a lightweight review of [math] from a code/style POV (not
>technically as I'm not mathematical...)
>
>1) Remember your public API that you must maintain includes both public and
>protected classes/methods/fields. Are you confident that every method/field
>(especially fields) are correctly and suitably named for you to want to
>keep?
>
>  
>
Yes, this could stand a code review ( 1 hour with report to list? )

>2) Wherever there is implements Serializable you should have a
>serialVersionUID static field. And are you confident that the
>implementations are well enough established to allow for serialization? What
>about transient fields? Serialization is definitely not about adding the
>implements clause to everything.
>
>  
>
See serialization thread concerning resolving any stray serialization 
issues.

>3) Javadocs are sometimes thin. On occasions, they are written as paragraphs
>visually but without the HTML <p> tag. (eg. UnivariateRealSolver) or
>missing, eg StandardDeviation
>
>  
>
Yes, it would be good to maintain acceptable html in javadoc. Yet, I'd 
like to point out that javadoc isn't java code. while we would like to 
maintain lots of it to help our users understand it, the library works 
just fine without it.

>4) You might want to consider separating interfaces from implementations.
>Perhaps all interfaces in the base package, or impl subpackages. Then again
>the current layout may be best.
>
>  
>
I'm not convinced we are at the point where this is possible. Though its 
open to discussion. Some areas this may be easier than others 
(distributions, analysis). Some packages are rather small and sparse, 
this would make them even sparser (Complex).  I've put considerable 
effort into separating the the UnivariateStatistics interfaces from the 
actuall implementations. Just as a side note, at this time separation 
into implementations and interfaces should not mean two different jars 
as it is general Commons practice to try to maintain a project as one 
distributable jar.

>5) Is double suitable for these calculations? Should the strictfp flag be
>used? (I have no idea as to the answer, but I have to ask)
>
>  
>
Neither do I. Can anyone enlighten us?

>6) ComplexFormat doesn't extend the JDK Format class
>
>7) ComplexMath is a utils class, which would be named ComplexMathUtils in
>lang or collections (although the JDK Math class sets an alternate
>precedent) The constructor is private, which I would recommend is changed to
>public (aids tools like Velocity). Also Beta/Erf/Gamma.
>
>  
>
And throughout the Math package we use this naming "StatUtils", 
"MathUtils". Maybe it should just be "ComplexUtils".

I'm unsure about instantiating StaticUtils, does velocity actually call 
staic methods on instantiated objects? Doesn't this seem a bad behavior 
to promote?

>8) @author tags are often missing
>
>  
>
Development Team Decision, we do not want @author tags, developers and 
contributors are acnowledged in Maven project.xml and site docs.

>9) Factorys use the method newInstance() to obtain an instance -
>getInstance() seems more typical of commons, and allows you to return a
>cached value.
>
>  
>
There is a symantic difference here:

"getInstance" is ideal for singleton instances

"newInstance" is ideal for instantiation of new instances (See w3c DOM, SAX)

Somewhere along the line we introduced "createFooBar", for cases where a 
Factory creates multiple types of objects other than itself.

I can accept these naming conventions. They are not uncommon.

>10) SummaryStatistics/DescriptiveStatistics are factories but aren't named
>as such. They also use Class.forName() which is dubious in a class loader
>environment.
>
>  
>
True, its a toss up, there are alot of service discovery approaches. do 
we want to be dependent on the Discovery API, or just use the Java 
services mechanism and jar each implementation such that it can be 
"discovered"?

>11) Ensure there are no TODO comments in the Javadoc (OK in code) For
>example GammaDistributionImpl
>
>12) Should EmpiricalDistribution really have methods taking File and String
>filePath - is File not enough? The implementation actually seems to do
>something different (beware file encodings)
>
>  
>
I would argue not that alternate method signatures are unneccessary, 
they are assistive and offer options for coding, but that the methods 
should be organized to use only one implementation of a loading strategy. ie

public void load(URl url);

public void load(File file);

public void load(InputStream);

here InputStream is the method that actually implements the loading 
behavior and the other two signatures simply aquire an InputStream and 
pass it to the InputStream signature. Unfortunately, the way the 
implementation is designed such that it reads each url/file twice makes 
doing this not possible.

>13) Ensure every class has a constructor - never rely on the auto generated
>one, for example BivariateRegression
>
>  
>
Good Point

>14) Some import statements are rather screwed visually, eg Product
>
>  
>
These are formated using Eclipse configured to meet Sun formating 
standards. I assume this formating is caused when the package name gets 
long. I've maintained it for consistency.

>15) Dependencies!!!!
>commons-discovery can and should be an optional dependency. In fact it may
>already be, as each call to DiscoveryClass is in an Exception catch, which
>ought to trap ClassNotFoundException. However, I don't think that the
>default implementation behaviour is actually working which needs fixing.
>
>  
>
We need a good, standard discover mechanism. it sucks to have to copy 
and reimplement when other commons developers are already creating such 
excellent code. I know we want to keep dependencies to a minimum, but 
code reuse is also important, then point of Commons is to reduce code 
replication.

>commons-lang is only used for NestableException. This is a dubious
>dependency - the relevant code can be copied, and [lang]s ExceptionUtils is
>perfectly able to handle extracting cause parameters by reflection. (Don't
>copy NestableException - copy the code within)
>
>  
>
I actually agree here because this class was created just to provide 
Nestability to older JVM's. Its really just a "patch" until time renders 
those JVM's obsolete. The code would eventually be removed from the Math 
API.

>commons-logging is not used directly, just via beanutils and discovery (and
>is a classic example of commons chained dependencies that should be avoided)
>
>  
>
Yes, I had worked to make it a test and compile time dependency only.

>commons-beanutils doesn't seem to be used, so shouldn't be in project xml
>
>  
>
Its a test, compile time dependency for BeanListUnivariate, not a 
runtime dependency.

>commons-collections TreeBag is used by Frequency - it might be worth
>considering removing this dependency, but it is the hardest.
>(CollectionUtils.NATURAL_COMPARATOR is a simple class to copy)
>
>  
>
I'm not adverse to maintaing it, this is a Complex Datastructure, I'd 
rather reley on those who thought it out supporting it, Math shouldn't 
be about Collections.

>Anyway, I hope this focuses attention on what needs doing (boring-wise)
>before 1.0.
>
>Stephen
>  
>
I'd like us not to get too bound up in the dependency issue. If they can 
esily be removed, great, if they are to access complex datstructures and 
behavior, lets let those experts maintain such tools and we promote 
thier reuse. I would really classify Math as a secondary level Commons 
package. It does some complex things for which the Collections  
dependencies can be deemed acceptable and appropriate.

-Mark


Mime
View raw message