commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Barger (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MATH-1374) KMeansPlusPlusClusterer unable to converge having repeatable points in input dataset
Date Thu, 02 Jun 2016 07:44:59 GMT

     [ https://issues.apache.org/jira/browse/MATH-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Artem Barger updated MATH-1374:
-------------------------------
    Attachment: MATH-1374.patch

Proposed fix which deals w/ the problem.

> KMeansPlusPlusClusterer unable to converge having repeatable points in input dataset
> ------------------------------------------------------------------------------------
>
>                 Key: MATH-1374
>                 URL: https://issues.apache.org/jira/browse/MATH-1374
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Artem Barger
>         Attachments: MATH-1374.patch
>
>
> If the input list size of {{Clusterable}} is greater than parameter {{k}} while has less
unique points than {{k}}, the algorithm will fail to converge, tested w/ different EmptyClusterStrategy
options, here is the example of default one: 
> {code}
>    @Test
>     public void testNumberOfRequestedClustersSameAsInputSize() {
>         final RandomVectorGenerator rng = new UncorrelatedRandomVectorGenerator(10,
>                 new GaussianRandomGenerator(RandomSource.create(RandomSource.MT)));
>         List<DoublePoint> points = new ArrayList<>();
>         for (int i = 0; i < 10; i++) {
>             final DoublePoint point = new DoublePoint(rng.nextVector());
>             for (int j = 0; j < 3; j++) {
>                 points.add(point);
>             }
>         }
>         final KMeansPlusPlusClusterer<DoublePoint> clusterer = new KMeansPlusPlusClusterer<>(12);
>         clusterer.cluster(points);
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message