hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler
Date Mon, 08 Feb 2010 19:27:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831071#action_12831071

Todd Lipcon commented on MAPREDUCE-1436:

Hey Matei, sorry for the slow response - I forgot to watch this ticket.

bq. JobTracker only calls listener.jobAdded/jobRemoved when it is already holding a lock on
itself (e.g. in JobTracker.addJob).

I think it's best to still synchronize on TaskTrackerManager here from within the fairsched.
I think a synchronized block on a monitor you've already got locked has essentially no cost,
and it will reduce the jcarder output so we can notice if we accidentally introduce "real"
bugs later. Do you agree?

bq. , I always locked the JT before locking the scheduler.

I can see why the coarse locking isn't a great idea for scalability. In this case, though,
we're just adding a lock in a place where we already assume the lock is taken, yea? (so it
isn't any more coarse than before)

> Deadlock in preemption code in fair scheduler
> ---------------------------------------------
>                 Key: MAPREDUCE-1436
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1436
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Blocker
>         Attachments: deadlock.png, mapreduce-1436.patch
> In testing the fair scheduler with preemption, I found a deadlock between updatePreemptionVariables
and some code in the JobTracker. This was found while testing a backport of the fair scheduler
to Hadoop 0.20, but it looks like it could also happen in trunk and 0.21. Details are in a
comment below.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message