commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Barger (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MATH-1374) KMeansPlusPlusClusterer unable to converge having repeatable points in input dataset
Date Thu, 02 Jun 2016 07:13:59 GMT
Artem Barger created MATH-1374:
----------------------------------

             Summary: KMeansPlusPlusClusterer unable to converge having repeatable points
in input dataset
                 Key: MATH-1374
                 URL: https://issues.apache.org/jira/browse/MATH-1374
             Project: Commons Math
          Issue Type: Bug
            Reporter: Artem Barger


If the input list size of {{Clusterable}} is greater than parameter {{k}} while has less unique
points than {{k}}, the algorithm will fail to converge, tested w/ different EmptyClusterStrategy
options, here is the example of default one: 

{code}
   @Test
    public void testNumberOfRequestedClustersSameAsInputSize() {

        final RandomVectorGenerator rng = new UncorrelatedRandomVectorGenerator(10,
                new GaussianRandomGenerator(RandomSource.create(RandomSource.MT)));

        List<DoublePoint> points = new ArrayList<>();

        for (int i = 0; i < 10; i++) {
            final DoublePoint point = new DoublePoint(rng.nextVector());
            for (int j = 0; j < 3; j++) {
                points.add(point);
            }
        }

        final KMeansPlusPlusClusterer<DoublePoint> clusterer = new KMeansPlusPlusClusterer<>(12);
        clusterer.cluster(points);
    }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message