commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Neidhart <>
Subject Re: [Math] MATH-917 (Was: [math] releasing 3.2?)
Date Tue, 26 Mar 2013 19:07:42 GMT
On 03/26/2013 01:38 PM, Gilles wrote:
>>> [...]
>>>>> There is at least one contribution requiring some feedback (MATH-917).
>>>> I have added a first patch and would love to get some feedback.
>>> It seems we are really close to releasing now. I guess MATH-817 will be
>>> resolved soon. What seems to be achievable in a short time frame
>>> would be:
>>>   - wrapping up MATH-917 as Thomas proposed,
>>>   - finishing MATH-437 (Kolmogorov-Smirnov), and perhaps also
>>>     MATH-228 at the same time
>>>   - removing cobertura from parent
>>> I would propose to postpone MATH-873, MATH-923 and MATH-874 to 4.0.
>>> What do you think?
>> Looks good. I will finish my work on MATH-917 till Wednesday the latest.
> I wonder whether "Clusterable" should be (the generic type seems
> superfluous):

you are right, I did remove the generic type already in my local

> public interface Clusterable {
>   double[] getClusteringData();
> }
> And the "distance" interface simplified to:
> public interface Distance {
>   double compute(double[] a, double[] b);
> }
> The concept being more general than just for clustering, the interface and
> its implementations probably belong elsewhere, e.g. in "o.a.c.m.util" (or
> perhaps even better as static inner classes of "MathArrays").

I thought about this too, but the downside would be that any Clusterable
must have a double[] internally. Having dimension() and component(int)
methods is more flexible imho, but both approaches have their benefits.

> "AbstractClusterer" should be immutable (i.e. no distance setter).
> If using interfaces, the "getDistanceMeasure" method should probably appear
> in one of them. Maybe:
> public interface Clusterer<T extends Clusterable> {
>   List<? extends Cluster<T>> cluster(Collection<T> points);
>   Distance getDistance();
> }
> However, in line with what we've been doing the last 3 or 4 years,
> "Clusterer" could just be an abstract class.

I was musing about making the Clusterer implementations immutable, or to
create setters for their necessary parameters, e.g. the k parameter for

Otherwise you have to create a new object each time you want to change
something, would this be acceptable?

Another thing that I changed locally: the KMeans clusterer currently
supports multi-evaluation with a separate method. This could be moved to
a separate class MultiKMeansPlusPlusClusterer, which allows to use it
with the same Clusterer interface.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message