Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63AC09742 for ; Tue, 3 Apr 2012 03:52:56 +0000 (UTC) Received: (qmail 38066 invoked by uid 500); 3 Apr 2012 03:52:56 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 37514 invoked by uid 500); 3 Apr 2012 03:52:55 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 37026 invoked by uid 99); 3 Apr 2012 03:52:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2012 03:52:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2012 03:52:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 51F053549BA for ; Tue, 3 Apr 2012 03:52:29 +0000 (UTC) Date: Tue, 3 Apr 2012 03:52:29 +0000 (UTC) From: "Paritosh Ranjan (Commented) (JIRA)" To: dev@mahout.apache.org Message-ID: <962902808.4625.1333425149337.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <492421122.5355.1325697280186.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAHOUT-940) Clusterdumper - Get rid of map based implementation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244952#comment-13244952 ] Paritosh Ranjan commented on MAHOUT-940: ---------------------------------------- 1) yes 2) It might be a good idea to do some testing before/after your code change. i.e. Running all Junit tests, and some manual testing using clusterdumper ( dump a cluster using new implementation which was getting OOM with the older implementation). It will make sure that the code is working. Also, you can also try to test quality before after using the post processor. i.e. The results should be same, whether you use the map based or post processor based implementation. So, to test it, do not get rid of the older coder, rather provide an option to use the map based/post processor based implementation. This will help in testing. Later it can be decided which version to keep i.e. new/both. > Clusterdumper - Get rid of map based implementation > --------------------------------------------------- > > Key: MAHOUT-940 > URL: https://issues.apache.org/jira/browse/MAHOUT-940 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.6 > Reporter: Paritosh Ranjan > Assignee: Paritosh Ranjan > Fix For: 0.7 > > > Current implementation of ClusterDumper puts clusters and related vectors in map. This generally results in OOM. > Since ClusterOutputProcessor is availabale now. The ClusterDumper will at first process the clusteredPoints, and then write down the clusters to a local file. > The inability to properly read the clustering output due to ClusterDumper facing OOM is seen too often in the mailing list. This improvement will fix that problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira