hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler
Date Mon, 15 Feb 2010 03:43:28 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matei Zaharia updated MAPREDUCE-1436:
-------------------------------------

    Attachment: mapreduce-1436-v2.patch

Here's a new patch that always locks the JobTracker before locking the FairScheduler in update().
This should resolve both of the deadlocks reported above. I've also increased the default
update interval from 0.5 seconds to 2.5 seconds in this patch. The only negative impact of
this should be that preemption and speculation take slightly longer to kick in. These are
really the only reasons we need to call update() other than when jobs are added and removed;
speculative tasks are counted in updateDemand, and preemption is checked regularly in updatePreemptionVariables().

I've also thought a bit about the impact of coarser locking on performance of the JobTracker,
and I think it's actually not that much. First of all, since assignTasks already locks the
FairScheduler, we wouldn't get much farther by locking only the FS in update() and not the
JT, because the JT calls assignTasks on every heartbeat anyway. Second, I timed update() on
a simulated cluster with 2500 nodes, 4 slots per node, 100 jobs and 20 pools, and one call
to update() took about 50 ms. With the new default update interval of 2500 ms, only 2% of
the time in the JobTracker should be spent on this (and for such a large cluster, the update
interval can be upped through the config file anyway).

> Deadlock in preemption code in fair scheduler
> ---------------------------------------------
>
>                 Key: MAPREDUCE-1436
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1436
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Blocker
>         Attachments: deadlock.png, mapreduce-1436-v2.patch, mapreduce-1436.patch
>
>
> In testing the fair scheduler with preemption, I found a deadlock between updatePreemptionVariables
and some code in the JobTracker. This was found while testing a backport of the fair scheduler
to Hadoop 0.20, but it looks like it could also happen in trunk and 0.21. Details are in a
comment below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message