hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1461) Corner-case deadlock in TaskTracker
Date Tue, 05 Jun 2007 10:35:28 GMT
Corner-case deadlock in TaskTracker

                 Key: HADOOP-1461
                 URL: https://issues.apache.org/jira/browse/HADOOP-1461
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.3
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy
            Priority: Critical
             Fix For: 0.14.0
         Attachments: main_taskcleanup_deadlock.txt

Thanks to Koji for the attached stack-trace...


  -> offerService()
    -> markUnresponsiveTasks (locks the TaskTracker here)
      -> purgeTask() 
        -> removeTaskFromJob (waiting to lock the RunningJob object)

  -> purgeJob (locks the RunningJob object)
    -> TIP.jobHasFinished()
      -> TIP.cleanup (waiting to lock the TaskTracker)


Clear-case of ordering issues during synchronization... it's a corner-case since it depends
on the child-vm getting unresponsive _and_ the cleanup thread kicking in; which is why I'm
marking this for 0.14.0 rather than 0.13.0 - what do others think about this?


Two possible solutions to break the deadlock cycle:

a) Make TaskTracker.purgeJob a synchronized method, thus it locks the TaskTracker before locking
the RunningJob method.
b) Make the TaskTracker.tasks map a *Collections.synchronizedMap*, thus doing away with the
need to lock the TaskTracker in TIP.cleanup

I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in synchronized
methods... and hence is a less intrusive change.



This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message