hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Bowen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1146) "Reduce input records" counter name is misleading
Date Fri, 23 Mar 2007 22:18:32 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

David Bowen updated HADOOP-1146:

    Attachment: 1146.patch

This patch:

   1. Renames the counter Reduce Input Records to Reduce Input Groups since that what it counts.

   2. Adds a new counter called Reduce Input Records that does count the records.

   3. Then when testing on Wordcount, I noticed that Map Output Records and Reduce Input Records
were not the same because of the use of a Combiner.  So I added two new counters to show this:
Combine Input Records and Combine Output Records.

I'm not sure if we really need these Combine Input/Output record counters.  At the end of
the job, they should be the same as Map Output Records and Reduce Input Records respectively,
but they are possibly interesting to watch as the job proceeds.

Comments welcome.

> "Reduce input records" counter name is misleading
> -------------------------------------------------
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>         Attachments: 1146.patch
> It has been pointed out that the counter name "reduce input records" is misleading; this
number should be called "reduce input keys" or "reduce input groups".  It could also be useful
to have the actual number of reduce input records, which should be the same as the number
of map output records.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message