Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 1028 invoked from network); 24 Nov 2008 11:31:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Nov 2008 11:31:10 -0000 Received: (qmail 47506 invoked by uid 500); 24 Nov 2008 11:31:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 47167 invoked by uid 500); 24 Nov 2008 11:31:15 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 47137 invoked by uid 99); 24 Nov 2008 11:31:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Nov 2008 03:31:15 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Nov 2008 11:29:58 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7493C234C2A7 for ; Mon, 24 Nov 2008 03:30:44 -0800 (PST) Message-ID: <2056408224.1227526244476.JavaMail.jira@brutus> Date: Mon, 24 Nov 2008 03:30:44 -0800 (PST) From: "Chris Douglas (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces In-Reply-To: <2132140.1201939988789.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650176#action_12650176 ] Chris Douglas commented on HADOOP-2774: --------------------------------------- I haven't been through the rest of the code in detail, but the test case shouldn't need to read/write so much data to test the spill counters. * Would it work if you used a smaller io.sort.mb and calibrated the size of your data to trigger a fixed number of spills? In the current version, spills should be triggered based on the number of records, which is a property the test isn't controlling strictly. * Why run the combiner? Isn't each word coming out of each map unique? * It might be necessary to set mapred.child.java.opts explicitly to make sure the memory limit stays fixed, even for different client configurations. Does it not work with mapred.job.shuffle.buffer.percent = 0? * The test cannot create its scratch directory in the working dir. It should use the test.build.data property as the root for its temporary data. It should also clean up when the test completes. * testCounters looks like a unit test and only emits log messages. It seems unnecessary and less readable than putting the asserts inline with the unit test > Add counters to show number of key/values that have been sorted and merged in the maps and reduces > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-2774 > URL: https://issues.apache.org/jira/browse/HADOOP-2774 > Project: Hadoop Core > Issue Type: Bug > Reporter: Owen O'Malley > Assignee: Ravi Gummadi > Fix For: 0.20.0 > > Attachments: HADOOP-2774.patch, HADOOP-2774.patch, HADOOP-2774.patch > > > For each *pass* of the sort and merge, I would like a count of the number of records. So for example, if the map output 100 records and they were sorted once, the counter would be 100. If it spilled twice and was merged together, it would be 200. Clearly in a multi-level merge, it may not be a multiple of the number of map output records. This would let the users easily see if they have values like io.sort.mb or io.sort.factor set too low. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.