hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-434) local map-reduce job limited to single reducer
Date Tue, 09 Feb 2010 00:45:28 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron Kimball updated MAPREDUCE-434:

    Attachment: MAPREDUCE-434.4.patch

Attaching a new patch that fixes TestJobCounters. TestJobCounters tracks the number of spilled
records; the jobs "A", "B", and "C" were off by 16K, 32K, and 24K respectively in their previous
values vs. current ones.

I believe that the reason for this is that when the reducer reads records from a disk file
that increases the spilled records counter; previously, the localjobrunner copied map output
files to the reducer and then ran the merge, reading all those records in on the "reduce side."
The new logic uses the LocalFetcher which fetches all records from the "map side" to memory
on the reduce side. In jobs A and B, the difference in counter values is exactly the number
of records emitted by the combiner -- suggesting that those records were previously double-counted,
but now are counted only once (correctly). Job C is harder for me to understand because it
involves 5 map tasks and thus has a multi-level merge (io sort factor=2), but I think the
difference is benign. If someone more familiar with the merge counters would take a look at
this, I'd appreciate it.

> local map-reduce job limited to single reducer
> ----------------------------------------------
>                 Key: MAPREDUCE-434
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: local job tracker
>            Reporter: Yoram Arnon
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch,
> when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and
the number of reduce tasks is set at 1.
> This prevents me from locally debugging my partition function, which tries to partition
based on the number of reduce tasks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message