mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paritosh Ranjan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-940) Clusterdumper - Get rid of map based implementation
Date Tue, 03 Apr 2012 03:52:29 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244952#comment-13244952
] 

Paritosh Ranjan commented on MAHOUT-940:
----------------------------------------

1) yes
2) It might be a good idea to do some testing before/after your code change. i.e. Running
all Junit tests, and some manual testing using clusterdumper ( dump a cluster using new implementation
which was getting OOM with the older implementation). It will make sure that the code is working.

Also, you can also try to test quality before after using the post processor. i.e. The results
should be same, whether you use the map based or post processor based implementation.

So, to test it, do not get rid of the older coder, rather provide an option to use the map
based/post processor based implementation. This will help in testing. Later it can be decided
which version to keep i.e. new/both.
                
> Clusterdumper - Get rid of map based implementation
> ---------------------------------------------------
>
>                 Key: MAHOUT-940
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-940
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Current implementation of ClusterDumper puts clusters and related vectors in map. This
generally results in OOM.
> Since ClusterOutputProcessor is availabale now. The ClusterDumper will at first process
the clusteredPoints, and then write down the clusters to a local file. 
> The inability to properly read the clustering output due to ClusterDumper facing OOM
is seen too often in the mailing list. This improvement will fix that problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message