hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events
Date Sat, 10 Oct 2015 20:09:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952032#comment-14952032
] 

Anubhav Dhoot commented on YARN-4247:
-------------------------------------

Tested this in a cluster. Before this fix the cluster would fall over around 3 to 4 hours.
After this fix the cluster going strong beyond 24 hours.

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-4247
>                 URL: https://issues.apache.org/jira/browse/YARN-4247
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Blocker
>         Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we see this
in the logs before the RM dies of OOM {noformat} 2015-10-08 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Size of event-queue is 1488000 2015-10-08 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Size of event-queue is 1488000 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message