commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luc Maisonobe (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MATH-429) KMeansPlusPlusClusterer breaks by division by zero
Date Fri, 22 Oct 2010 16:49:17 GMT

    [ https://issues.apache.org/jira/browse/MATH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923906#action_12923906
] 

Luc Maisonobe commented on MATH-429:
------------------------------------

You have encountered one classical proble with k-means: at some stage (here at the first iteration),
one of the clusters becomes empty.
This case is currently no handled by commons-math (which is a bug, so we have to fix it).
When a cluster is empty, a new centroid must be defined from the other clusters. There are
different strategies:

# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points

My prefered choice would be 2, what do other people think ?


> KMeansPlusPlusClusterer breaks by division by zero
> --------------------------------------------------
>
>                 Key: MATH-429
>                 URL: https://issues.apache.org/jira/browse/MATH-429
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.1
>         Environment: Java, Windows
>            Reporter: Erik van Ingen
>            Priority: Blocker
>         Attachments: KMeansPlusPlusClustererTest.java
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> For a certain space, KMeansPlusPlusClusterer  breaks. This is a blocker because this
space occurs in our domain. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message