hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer
Date Sat, 16 Jan 2010 01:24:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801053#action_12801053
] 

Aaron Kimball commented on MAPREDUCE-434:
-----------------------------------------

I've attached a patch which implements multiple reducers, which can operate in parallel. This
patch builds on the work done in MAPREDUCE-1367. (I've marked this as a dependency.) So I
won't mark this patch-available until that's committed first.

I've subclassed the Fetcher class with another version called LocalFetcher. This knows how
to fetch inputs from a locally-provided mapping of map task attempt IDs to MapOutputFiles.
The Shuffle class now instantiates a LocalFetcher if the job is local, instead of an array
of Fetchers. Currently you are restricted to a single LocalFetcher instance per ReduceTask
(because it is using a thread-local work queue).

This will also execute multiple reducers in parallel using an ExecutorService framework similar
to that of MAPREDUCE-1367. Configuration of reducer parallelism is performed through a new
static method associated with LocalJobRunner. ({{public static int LocalJobRunner.setLocalMaxRunningMaps(JobContext,
int}})

Currently there is a barrier in which all mappers must be complete before the reducers can
start fetching. Future work involves improving the LocalJobRunner to provide TaskCompletionEvents
so that the fetching can occur in parallel with additional map tasks -- but I think that's
out-of-scope for this issue.

This test includes some testcases where varying numbers of map and reduce tasks run with differing
degrees of parallelism. The test checks that all the output that it expects makes it to the
end of the reducers and into the output files.

> local map-reduce job limited to single reducer
> ----------------------------------------------
>
>                 Key: MAPREDUCE-434
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: local job tracker
>            Reporter: Yoram Arnon
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: MAPREDUCE-434.patch
>
>
> when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and
the number of reduce tasks is set at 1.
> This prevents me from locally debugging my partition function, which tries to partition
based on the number of reduce tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message