hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
Date Thu, 07 Jan 2010 08:43:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797560#action_12797560

Arun C Murthy commented on MAPREDUCE-1342:

Hmm... first I think it's clear that this is a critical bug to be fixed asap.

Given the extremely fragile nature of the locking structure in the JobTracker I'm very, very
scared to make much changes here... 

How about a simpler proposal: Let's make JobTracker.activeTaskTrackers() and JobTracker.blacklistedTaskTrackers()
synchronized methods. First these are called only from the jsps, second this will sweep the
inverted locking order under the carpet of having to lock the JobTracker itself - thus solving
the deadlock. 

I realize this is very ugly and I'm cringing as I suggest this, but I do value the fact that
this will do away with the need for more significant changes - changes I'm very leery of!


> Potential JT deadlock in faulty TT tracking
> -------------------------------------------
>                 Key: MAPREDUCE-1342
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>         Attachments: cycle0.png, mapreduce-1342-1.patch, mapreduce-1342-2.patch
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then
calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted()
which goes on to lock potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted and therefore
could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message