mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smita Wadhwa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input
Date Wed, 26 Sep 2012 08:18:08 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463610#comment-13463610
] 

Smita Wadhwa commented on MAHOUT-1080:
--------------------------------------

I have created a WeightedTextVectorWritable(vector,distance-from-the-centre, vectorId) to
hold the output having vectorId as text . I have made it text for future use if its int/double
or text - we can output as text.

PFA the patch for this fix having output vectorId given in the input. Updated the test cases
and verified the output with unit test cases as well on haddop cluster.

The changes are done for both sequential and MR job both.
                
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
>                 Key: MAHOUT-1080
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1080
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Smita Wadhwa
>         Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable 
> and the output of clustered points is clusterId WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing . 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message