hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-968) Reduce shuffle and merge should be done a child JVM
Date Tue, 03 Apr 2007 18:14:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486453
] 

Devaraj Das commented on HADOOP-968:
------------------------------------

The salient points of the design:
On the TaskTracker
1) The TaskTracker maintains the list of TaskCompletionEvents for a *job*. Whenever a ReduceTask
is assigned to a TaskTracker it extracts the JobId out of that. 
2) For that jobid it starts fetching MapTask completion events as long as any ReduceTask for
that job is in the SHUFFLE phase (this ensures that the TaskTracker sees all MapTask lost
events and keeps an updated cache of all events). When all the ReduceTasks for a given job
have gone past the SHUFFLE phase, the TaskTracker does not fetch any more MapTask completion
events until another ReduceTask gets assigned to it. If no other ReduceTask from the same
job gets assigned to it, and the job completes, it clears the cache of TaskCompletionEvents.
3) The event-fetcher thread blocks on runningJobs object. Whenever the method addTaskToJob
in TaskTracker adds a new Task to a job, it invokes runningJobs.notify(), so that the event-fetcher
thread can unblock and continue.
4) The event-fetcher thread also goes through the runningJobs and immediately stops fetching
events for those jobs that have been killed/failed.

On the TaskUmbilicalProtocol, ReduceTaskRunner & ReduceTask:
1) A new method - TaskCompletionEvent[] getSuccessMapCompleteEvents(String taskId, int fromIndex,
int maxLocs) throws IOException; - has been added for enabling the ReduceTask to fetch TaskCompletionEvents
cached at the TaskTracker. The semantics of this method are mirrored to the one in InterTrackerProtocol
- getTaskCompletionEvents, except that in the umbilical protocol, we are interested in just
the successful map events. Fetch failures are handled in the same way as is done today. Thus,
most of the fetcher code in ReduceTaskRunner remains the same (the code now is part of ReduceTask
in a new class called ReduceCopier, and the ReduceTaskRunner very closely matches to MapTaskRunner
in terms of functionality/code).

Comments?


> Reduce shuffle and merge should be done a child JVM
> ---------------------------------------------------
>
>                 Key: HADOOP-968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>
> The Reduce's shuffle and initial merge is done in the TaskTracker's JVM. It would be
better to have it run in the Task's child JVM. The advantages are:
>   1. The class path and environment would be set up correctly.
>   2. User code doesn't need to be loaded into the TaskTracker.
>   3. Lower memory usage and contention in the TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message