mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Musselman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-551) Kmeans example with space delimited data
Date Sun, 28 Feb 2016 22:56:18 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171226#comment-15171226
] 

Andrew Musselman commented on MAHOUT-551:
-----------------------------------------

This k-means job does work with numerical inputs but keep in mind this ticket is quite old
so things may have changed since this was integrated into the code base.

Are you having trouble running the job?

> Kmeans example with space delimited data
> ----------------------------------------
>
>                 Key: MAHOUT-551
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-551
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Integration
>    Affects Versions: 0.4
>            Reporter: Djellel Eddine Difallah
>            Assignee: Jeff Eastman
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: MAHOUT-551.patch, MAHOUT-551.patch
>
>
> The provided example for Kmeans clustering using the synthetic control data asks for
t1 and t2 measures because it runs the Canopy Driver to determine the initial clusters. Kmeans
originally requires a K variable to generate random centers from the input data. I propose
to add another example in the package which will serve for any space delimited numerical input
to cluster with Kmeans in its original form and not using Canopy. The modification is quite
simple and is mostly based on the synthetic control Job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message