Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 82897 invoked from network); 21 Nov 2008 07:07:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Nov 2008 07:07:11 -0000 Received: (qmail 292 invoked by uid 500); 21 Nov 2008 07:07:14 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 259 invoked by uid 500); 21 Nov 2008 07:07:14 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 248 invoked by uid 99); 21 Nov 2008 07:07:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2008 23:07:13 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Nov 2008 07:05:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 80DED234C2A0 for ; Thu, 20 Nov 2008 23:06:44 -0800 (PST) Message-ID: <248472177.1227251204526.JavaMail.jira@brutus> Date: Thu, 20 Nov 2008 23:06:44 -0800 (PST) From: "Chris Douglas (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces In-Reply-To: <2132140.1201939988789.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649617#action_12649617 ] Chris Douglas commented on HADOOP-2774: --------------------------------------- bq. The two counters are for counting the number of records read and the number of records written, not for determining whether the records came from disk/memory. I'm confused. The map needs to count spilled records as it writes to disk; that's straightforward. The reduce- since some of its fetched segments are written directly to disk- either needs a count from the map as in the original patch, or it should count records as it reads them from disk. Why does the merger need two counters, particularly since the caller is the only one that knows whether it's going to disk or to a reduce? When would the number of records read from its segments differ from the number of records it ultimately emits? Passing an optional object to the merge that gets pinged every time it emits a record is both straightforward as an API change and sufficient for this particular use case. One could probably add the counter to the Merger without changing IFile.Reader and get the same semantics. I like the symmetry of adding counters to both, but either way is fine. > Add counters to show number of key/values that have been sorted and merged in the maps and reduces > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-2774 > URL: https://issues.apache.org/jira/browse/HADOOP-2774 > Project: Hadoop Core > Issue Type: Bug > Reporter: Owen O'Malley > Assignee: Ravi Gummadi > Fix For: 0.20.0 > > Attachments: HADOOP-2774.patch, HADOOP-2774.patch > > > For each *pass* of the sort and merge, I would like a count of the number of records. So for example, if the map output 100 records and they were sorted once, the counter would be 100. If it spilled twice and was merged together, it would be 200. Clearly in a multi-level merge, it may not be a multiple of the number of map output records. This would let the users easily see if they have values like io.sort.mb or io.sort.factor set too low. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.