mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <>
Subject [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable
Date Tue, 29 Sep 2009 01:05:16 GMT


Jeff Eastman commented on MAHOUT-136:

I think this issue has been completed and should be closed, since Canopy does now use Vector
Writable for communicating the centroid vectors between the mapper and reducer. What it does
not do, is transmit Writable Canopies between the map and reduce steps as kmeans does. There
is an implementation of Writable methods for Canopy (IMHO it is not correct since it sets
the point total and count to nonzero values) but the mapper and reducer do not use them so
this is moot. Converting the mapper and reducer to communicate writable canopies can be done
but there are a lot of annoying little complications in the driver which currently goes to
some lengths to use the same vector form (dense, sparse) as the input data.

It works as implemented.

Unless somebody strongly disagrees I'm going to close this issue as resolved, since the real
intent was to replace the text representation of the centroid vector with the writable version
and that has been done for some time now.

> Change Canopy MR Implementation to use Vector Writable
> ------------------------------------------------------
>                 Key: MAHOUT-136
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.2
> Internal serialization of Canopy currently uses asFormatString rather than just making
the Canopy writable. This is storage inefficient.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message