commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1330) KMeans clustering algorithm, doesn't support clustering of sparse input data.
Date Sat, 23 Apr 2016 00:44:12 GMT

    [ https://issues.apache.org/jira/browse/MATH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254977#comment-15254977
] 

Gilles commented on MATH-1330:
------------------------------

Although a seemingly interesting generalization, it seems unlikely that any of the regular
contributors will have the time to tackle this task.  Would you be willing to work on it?


> KMeans clustering algorithm, doesn't support clustering of sparse input data.
> -----------------------------------------------------------------------------
>
>                 Key: MATH-1330
>                 URL: https://issues.apache.org/jira/browse/MATH-1330
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Artem Barger
>
> Currently *KMeansPlusPlusClusterer* class require from generic parameter *T`* to extend
from *Clusterable* interface, which is:
> {code}
> public interface Clusterable {
>     /**
>      * Gets the n-dimensional point.
>      *
>      * @return the point array
>      */
>     double[] getPoint();
> }
> {code}
> i.e. returns dense representation of the clusterable data, hence making it impossible
to efficiently compute kmeans clustering on big dimensional, but very sparse data. I think
it will be much better if *Clusterable* interface will return a *Vector* allowing usage of
*SparceVector*s while clustering the data. Of course *KMeansPlusPlusClusterer* implementation
and I assume other clustering implementations should be refactored accordingly to support
this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message