[ https://issues.apache.org/jira/browse/MATH429?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12923906#action_12923906
]
Luc Maisonobe edited comment on MATH429 at 10/22/10 12:48 PM:

You have encountered one classical problem with kmeans: at some stage (here at the first
iteration), one of the clusters becomes empty.
This case is currently no handled by commonsmath (which is a bug, so we have to fix it).
When a cluster is empty, a new centroid must be defined from the other clusters. There are
different strategies:
# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points
My prefered choice would be 2, what do other people think ?
was (Author: luc):
You have encountered one classical proble with kmeans: at some stage (here at the first
iteration), one of the clusters becomes empty.
This case is currently no handled by commonsmath (which is a bug, so we have to fix it).
When a cluster is empty, a new centroid must be defined from the other clusters. There are
different strategies:
# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points
My prefered choice would be 2, what do other people think ?
> KMeansPlusPlusClusterer breaks by division by zero
> 
>
> Key: MATH429
> URL: https://issues.apache.org/jira/browse/MATH429
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 2.1
> Environment: Java, Windows
> Reporter: Erik van Ingen
> Priority: Blocker
> Attachments: KMeansPlusPlusClustererTest.java
>
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> For a certain space, KMeansPlusPlusClusterer breaks. This is a blocker because this
space occurs in our domain.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
