hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking
Date Mon, 04 Jan 2010 06:08:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796092#action_12796092

Amar Kamat commented on MAPREDUCE-1342:

Simply making _potentiallyFaultyTrackers_ a concurrent HashMap and removing the *synchronized*
keyword might introduce more issues. I think the reason for synchronizing on _potentiallyFaultyTrackers_
was to perform some operations in an atomic manner. Have you checked if the semantics remain
same after removing the synchronized keyword? I think making _potentiallyFaultyTrackers_ as
concurrent HashMap is better but might be dangerous.

One other way to avoid the deadlock would be by marking few non-private apis in JobTracker.FaultyTrackerInfo
as synchronized. Mainly
JobTracker.FaultyTrackerInfo.incrementFaults // called via Heartbeat and testcases
JobTracker.FaultyTrackerInfo.markTrackerHealthy // called via Heartbeat
JobTracker.FaultyTrackerInfo.shouldAssignTasksToTracker // called via Heartbeat and testcases
JobTracker.FaultyTrackerInfo.isBlacklisted // called in multiple cases .. need to check
JobTracker.FaultyTrackerInfo.getFaultCount // called via Heartbeat and testcases
JobTracker.FaultyTrackerInfo.getReasonForBlackListing // never used!
JobTracker.FaultyTrackerInfo.setNodeHealthStatus // called via Heartbeat and testcases

So except JobTracker.FaultyTrackerInfo.isBlacklisted(), all the calls are centrally locked
on JobTracker. Hence adding the synchronized keyword in the method signature wouldnt introduce
any overhead. Need to check on JobTracker.FaultyTrackerInfo.isBlacklisted().

> Potential JT deadlock in faulty TT tracking
> -------------------------------------------
>                 Key: MAPREDUCE-1342
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>         Attachments: cycle0.png, mapreduce-1342-1.patch
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, and then
calls blackListTracker, which calls removeHostCapacity, which locks JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then calls faultyTrackers.isBlacklisted()
which goes on to lock potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted and therefore
could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message