hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
Date Fri, 01 Jul 2011 19:02:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058707#comment-13058707

Scott Carey commented on MAPREDUCE-2634:

The fundamental problem being addressed in #1 is the ping-response protocol.  A (big) change
to register-notify would reduce latencies and total RPC's without introducing complicated
state and feedback issues that the #1 proposal would.  As Owen says, it is very hard to ensure
you don't have deadlocks or corrupted state once one side asks the other to ping it.  Race
conditions galore arise once both sides initiate communication.

I think the way to go is to work towards MAPREDUCE-279 being in an Apache release ASAP, and
then work on incremental improvement to that to reduce latencies for smaller clusters/jobs.
 Many of the above problems are now partially mitigated.

> MapReduce Performance Improvements using forced heartbeat 
> ----------------------------------------------------------
>                 Key: MAPREDUCE-2634
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Abhijit Suresh Shingate
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Following are the proposals which would cause some performance optimizations over MapReduce
> *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
>   a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker
only when the TaskTracker sends heartbeat  to JobTracker
>   b) Proposal:
>         - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever
a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately
assigned to all TaskTrackers. 
> *2. Execute Job Setup and Cleanup on JobTracker JVM*
>   a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker
>   b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead.
It takes generally about 0.7 - 1.5 seconds.
>   c) Proposal:
>         - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM
> *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
>   a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat
at a regular interval.
>   b) Proposal:
>         - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task
is completed. So that Reduce task can quickly know which map task is finished and copy map
outputs to local.
> *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished.*
>   a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through
heartbeat response.
>   b) Proposal:
>         - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce
Task is completed.
> These optimizations might work on small clusters but on big clusters it may be overhead.
> Please let us know your views.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message