hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
Date Wed, 03 Dec 2014 16:59:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233192#comment-14233192
] 

Naganarasimha G R commented on YARN-2874:
-----------------------------------------

Hi [~ozawa] & [~kasha],
Thanks for the review and feed back. I put some effort to write the test code to reproduce
this issue but as more and more sleeps and wait notify was required and was not consistently
going into deadlock, i thought its not worth the effort as the dead lock scenario was easily
detectable.
bq. RenewalTimerTask is a method which has a side effect, so the state can be invalid after
the patch. We need to update the long error handling before merging it.
Was not so clear about this statement as i was not able to get which state gets invalidated
because of the fix and further you ( [~ozawa]) had mentioned ??Rethinking of this, this is
not related to this JIRA.?? , so please if any thing more needs to be updated for this issue
please inform.

Regarding Sid's comment in MAPREDUCE-5384, If required to be be handled IIUC i need to revert
my patch and redo as below (correct me if wrong and also inform if its req to be fixed in
this way)
{quote}
{noformat}
@Override
    public void run() {
      if (cancelled) {
        return;
      }
      Token<?> token = dttr.token;
      try {
	synchronized (this) {
            if (cancelled) {
              return;
            }
	  requestNewHdfsDelegationTokenIfNeeded(dttr);
	  // if the token is not replaced by a new token, renew the token
	  if (appTokens.get(dttr.applicationId).contains(dttr)) {
	    renewToken(dttr);
	    setTimerForTokenRenewal(dttr);// set the next one
	  } else {
	  LOG.info("The token was removed already. Token = [" +dttr +"]");
	  }
	}
      } catch (Exception e) {
        LOG.error("Exception renewing token" + token + ". Not rescheduled", e);
        removeFailedDelegationToken(dttr);
      }
    }
{noformat}
{quote}


> Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-2874
>                 URL: https://issues.apache.org/jira/browse/YARN-2874
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0, 2.5.1
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Blocker
>         Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch
>
>
> When token renewal fails and the application finishes this dead lock can occur
> Jstack dump :
> {quote}
> Found one Java-level deadlock:
> =============================
> "DelegationTokenRenewer #181865":
>   waiting to lock monitor 0x0000000000900918 (object 0x00000000c18a9998, a java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
> "DelayedTokenCanceller":
>   waiting to lock monitor 0x0000000004141718 (object 0x00000000c7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
>   which is held by "Timer-4"
> "Timer-4":
>   waiting to lock monitor 0x0000000000900918 (object 0x00000000c18a9998, a java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
>  
> Java stack information for the threads listed above:
> ===================================================
> "DelegationTokenRenewer #181865":
> at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
> - waiting to lock <0x00000000c18a9998> (a java.util.Collections$SynchronizedSet)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "DelayedTokenCanceller":
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
> - waiting to lock <0x00000000c7eae720> (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
> - locked <0x00000000c18a9998> (a java.util.Collections$SynchronizedSet)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
> at java.lang.Thread.run(Thread.java:745)
> "Timer-4":
> at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - waiting to lock <0x00000000c18a9998> (a java.util.Collections$SynchronizedSet)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
> - locked <0x00000000c7eae720> (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
>  
> Found 1 deadlock.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message