commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: [math] abstact nonsense was Re: [math][functor] More Design Concerns
Date Wed, 02 Jul 2003 17:04:16 GMT
Phil Steitz wrote:

> Brent Worden wrote:
>
>>
>> Unfortunately, Java has created a huge distinction between objects and
>> primitives.  They're incompatible types.  Objects have to be treated 
>> in a
>> distinctly different manner than primitive values.  I prefer objects 
>> over
>> primitives because the other commons projects we depend on are built 
>> around
>> objects.  For instance, 90%+ of the functionality in 
>> commons-collections is
>> geared towards objects and unusable by our primitive approach.  I 
>> wager we
>> could see a significant code reduction in the univariate classes if 
>> we could
>> incorporate some of those object driven routines.  Yeah.
>>
> I am not convinced of this.  There really is not that much code there. 
> If what you think you can eliminate is all of the DoubleArray stuff, 
> that is probably true but at a significant loss of performance and 
> flexibility.  I would in any case always want to keep the array-based 
> implementations for speed and ease of use.  That would result in code 
> swell (and smell).  Yuk.

Commons [collections] has a task I've been considering working on, 
having to do with "primitive array collections", I think that our 
DoubleArray objects can eventually be "Collection" objects themselves 
(With our current DoubleArray interface as a "Primitive Array 
Collection" interface) It is not out of the realm of possibility to have 
DoubleArrays polymorph into Collection's without impacting our API in 
the least bit. After such an adoption, the lines between Collection and 
DoubleArray objects become less cumbersome.

>
>>
>>> but rather between reals, integers,
>>> complex numbers and more abstract mathematical objects such as group,
>>> field, ring elements or elements of topological spaces with certain
>>> properties. To me, doubles are "natively supported reals" and these are
>>> by far the most important objects that any applied math package will
>>> ever work with.  Almost every (another little pun) real statistical
>>> application uses real-valued random variables, for example.
>>
>>
>>
>> Statistical data analysis also involves dates, times, categories, 
>> etc.  None
>> of which can be handled by the univariate classes without converting 
>> them to
>> doubles before adding them to the container and reversing the conversion
>> when accessing metrics.  This is hardly convenient to the user.
>
>
> You are missing the point. To use the continuous methods, you *must* 
> convert to real in any case.  I would prefer to have the user control 
> this conversion.  Think through the use cases.  What does the mean of 
> a collection of dates mean? Need to decide discrete vs continuous and 
> set up a mapping -- a *random variable*.  I would prefer to let the 
> user do this explicitly and provide efficient, well-documented 
> computation support in commons-math. For the discrete case, I agree 
> that Frequency can certainly be improved/extended to accommodate 
> different sorts of objects, but there again, it is going to come down 
> to string representation of the discrete values and then floating 
> point computations to analyze the distributions.  I think that Tim's 
> "BeanList" stuff is the kind of thing that we should be looking at in 
> terms of extending to support collections, but even there the linkage 
> to the core computational infrastructure is real-valued properties.


Yes, these are strong points. The user needs to be able to control what 
which objects and which methods are used in such a calculation. I was 
looking over BeanListUnivariate since you brought it up as an example, 
and I now have a stronger criticism about the design. We see the 
downfall of things like having the "addValue" method in the Univarate 
interface here. It is a "storage" related method. In this case there are 
methods "cropping" up with do not fit properly with the Univariate 
Interface and there are methods in the Univariate Interface itself, 
which can no longer be implemented in ListUnivariates.

    /* (non-Javadoc)
     * @see org.apache.commons.math.Univariate#addValue(double)
     */
    public void addValue(double v) {
        String msg = "The BeanListUnivariateImpl does not accept values " +
            "through the addValue method.  Because elements of this list " +
            "are JavaBeans, one must be sure to set the 'propertyName' " +
            "property and add new Beans to the underlying list via the " +
            "addBean(Object bean) method";
        throw new UnsupportedOperationException( msg );
    }

    /**
     * Adds a bean to this list.
     *
     * @param bean Bean to add to the list
     */
    public void addObject(Object bean) {
        list.add(bean);
    }

I think we've determined through experience that having interface 
methods which cannot be supported across all implementations is a poor 
design. This is an example of where the current design is failing.  In 
this case what I feel we are seeing is too much of the "type" of the 
"Data Structure" getting bound up in the "statistical operation". Here 
is where separating the concerns of DataStorage and Operation is 
important. This is where the idea of Mathematical Operation Functors 
come into play. Now, arguments about if the actual methods that calc 
results should stay double or Object oriented (in input or output) is a 
smaller argument in the bigger problem observed above. It would be nice 
to have a simple means to take a collection, define the objects that you 
want to collect info on, define the method/bean property which values 
will be gathered from. and have the particular statistic your evaluating 
not have to implement methods to handle these details. 
BeanListUnivariate is a nice first pass, be we see some problems with 
being stuck in the traditional Interface<--Implementation approach for 
this now.

One idea is it have Custom Iterators. A Custom Iterator could walk 
through the objects in a collection (or the double values in an array) 
and evaluate them to collate information, the collection contains the 
objects, the Iterator object encapsulates the functionality for 
translating/mapping between Object and the return value that will be 
applied in the Statistic. We write the Statistic, we provide some 
generic iterators that can be extended, the user extends these to work 
with their collection. Calculating a statistic on a collection is simply 
grabbing that collection, instantiating a particular iterator and 
plugging "it" (not the collection) into the statistic. Then its up to 
the implementor of the Iterator how "efficient" it works with the 
collection or double[], in the double[] case it can just return the 
value, in the Collection case it may preform a number of tasks prior to 
returning a value.

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message