mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-551) Kmeans example with space delimited data
Date Sun, 21 Nov 2010 23:09:13 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934367#action_12934367
] 

Jeff Eastman commented on MAHOUT-551:
-------------------------------------

Patch looks about right and would use the RandomSeedGenerator correctly to prime initial clusters
for k-means. Adding a new package and an example class that is 99% the same as the synthetic
control kmeans Job seems like overkill, though. How about adding -k to the existing Job and
letting it serve both approaches? 

> Kmeans example with space delimited data
> ----------------------------------------
>
>                 Key: MAHOUT-551
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-551
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Djellel Eddine Difallah
>            Priority: Minor
>         Attachments: MAHOUT-551.patch
>
>
> The provided example for Kmeans clustering using the synthetic control data asks for
t1 and t2 measures because it runs the Canopy Driver to determine the initial clusters. Kmeans
originally requires a K variable to generate random centers from the input data. I propose
to add another example in the package which will serve for any space delimited numerical input
to cluster with Kmeans in its original form and not using Canopy. The modification is quite
simple and is mostly based on the synthetic control Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message