Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 39477 invoked from network); 19 Nov 2008 21:25:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Nov 2008 21:25:37 -0000 Received: (qmail 73436 invoked by uid 500); 19 Nov 2008 21:25:44 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 73169 invoked by uid 500); 19 Nov 2008 21:25:43 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 73152 invoked by uid 99); 19 Nov 2008 21:25:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2008 13:25:43 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Nov 2008 21:24:29 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6193A234C29E for ; Wed, 19 Nov 2008 13:24:44 -0800 (PST) Message-ID: <478290520.1227129884398.JavaMail.jira@brutus> Date: Wed, 19 Nov 2008 13:24:44 -0800 (PST) From: "Chris Douglas (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces In-Reply-To: <2132140.1201939988789.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649167#action_12649167 ] Chris Douglas commented on HADOOP-2774: --------------------------------------- Sorry, I was unclear. The issue is not the volatility of the aggregator, but that it is static. There are a number of contexts- JVM reuse, for one- where it simply will not work. Aggregating disparate counters in a shared variable only to push its _deltas_ to another aggregator is not an expected approach. A static counter 1) is already implemented in the Counter interface; duplicating its functionality to feed it is suspect and 2) is almost certainly more difficult to get right for all our use cases than some of the alternatives. Sharad's proposal seems very reasonable. I'd suggest one variant: adding a Counter formal to the IFile.Reader and IFile.Writer constructors. In the map, creating each Writer with a counter to track each record hitting disk should be accurate. In the reduce, instead of incrementing the counter as the segment is written to disk from the fetch and intermediate merges, updating it as it is *read* from disk will yield the correct value at the end of the job. So for the final merge into the reduce and the intermediate, on-disk merges, a counter will be provided. This makes it unnecessary to transfer the record count to the reduce, lets the IFile format remain exactly as it is, and should be fairly easy to implement. Thoughts? > Add counters to show number of key/values that have been sorted and merged in the maps and reduces > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-2774 > URL: https://issues.apache.org/jira/browse/HADOOP-2774 > Project: Hadoop Core > Issue Type: Bug > Reporter: Owen O'Malley > Assignee: Ravi Gummadi > Fix For: 0.20.0 > > Attachments: HADOOP-2774.patch, HADOOP-2774.patch > > > For each *pass* of the sort and merge, I would like a count of the number of records. So for example, if the map output 100 records and they were sorted once, the counter would be 100. If it spilled twice and was merged together, it would be 200. Clearly in a multi-level merge, it may not be a multiple of the number of map output records. This would let the users easily see if they have values like io.sort.mb or io.sort.factor set too low. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.