hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-2118) optimize getJobSetupAndCleanupTasks
Date Thu, 04 Nov 2010 22:52:44 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Joydeep Sen Sarma updated MAPREDUCE-2118:

    Attachment: mapreduce-2118.1.patch

do not hold JT lock around getSetupCleanupTasks. 

This required a change to not call back to the JT (.createTaskEntry) from TIP.addRunningTask
(which forced the caller to hold the JT lock). Now we only need the JIP lock to get the task
from the Job. The change to the JT data structures (made in JT.createTaskEntry) are made separately
(holding the JT lock).

We looked carefully of the implications of the JT data structures (task/tracker maps) being
potentially out of sync with the state of the JIP itself (JIP thinks a particular tip/attempt
has been scheduled - but the JT will not find it in it's tables). We were not able to find
code paths that were sensitive to this. It helps that there's only one heartbeat from one
tasktracker at a time. Most of the lookups to find an attempt can be only made in the context
of a heartbeat call from the tasktracker where the attempt is scheduled. by definition - we
are already processing the heartbeat from this tracker at the time of the divergence in the
state of the job and the JT.

It should be possible to extend this strategy to remove JT lock requirements around other
code paths.

> optimize getJobSetupAndCleanupTasks 
> ------------------------------------
>                 Key: MAPREDUCE-2118
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2118
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>         Attachments: mapreduce-2118.1.patch
> in every heartbeat, while holding the JobTracker global lock, all jobs are scanned for
job setup/cleanup, task setup/cleanup. on a large system with many trackers (and heartbeats)
and many jobs - this becomes the bottleneck for JT throughput.
> One possible route may be to rework the code to not require the JT lock while asking
the JIP whether it has a setup/cleanup task. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message