hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
Date Tue, 09 Dec 2014 08:01:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239107#comment-14239107
] 

Wangda Tan commented on YARN-2917:
----------------------------------

[~rohithsharma],
Good catch! Thanks for thinking about this. 

My take is this will happen when:
Step 1 : Thread #1 (event dispatcher thread) has some exception when dispatching, will call
System.exit
Step 2 : Thread #2 (RM main thread) registered ShutdownHook, and will finally call AsyncDispatcher.serviceExit
Step 3 : Thread #1 Is waiting for System.exit(-1) returns and Thread #2 is waiting for thread
#1 exit at the same time. It's a pair of deadlock.

But my question is: is it correct to set drainEventsOnStop to be false when such fatal error
happens? Shouldn't we wait for it to be drained even if fatal error happens?
Any thoughts?

> Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch
and AsyscDispatcher#serviceStop from shutdown hook
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2917
>                 URL: https://issues.apache.org/jira/browse/YARN-2917
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: 0001-YARN-2917.patch
>
>
> I encoutered scenario where RM hanged while shutting down and keep on logging {{2014-12-03
19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher
to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message