mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable
Date Sat, 20 Jun 2009 00:54:07 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722105#action_12722105
] 

Jeff Eastman commented on MAHOUT-136:
-------------------------------------

r786738 committed the following changes.
- Modified CanopyMapper and CanopyReducer to produce and consume Canopy centroids as Writable
values vs. previous formatStrings
- Modified CanopyMapper to specify SparseVector output from mapper
- Fixed null name hash() bug in SparseVector
- Modified Canopy.emitPointToExistingCanopies to emit only canopy id and not full serialized
canopy. 
- This eliminates the need for the OutputDriver and OutputMapper in synthetic control example
so they are deleted.
- Updated unit tests; all tests run
- Synthetic control example runs

NOTE: When passing Vectors between Map and Reduce steps using Writable format, Hadoop uses
the *same instance* to do all of the deserializations. I had to change the Canopy constructors
to clone() their center arguments so that the same instance would not be reused for multiple
canopies.

> Change Canopy MR Implementation to use Vector Writable
> ------------------------------------------------------
>
>                 Key: MAHOUT-136
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-136
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.1
>
>
> Internal serialization of Canopy currently uses asFormatString rather than just making
the Canopy writable. This is storage inefficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message