Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDE9411DA3 for ; Mon, 14 Apr 2014 14:45:08 +0000 (UTC) Received: (qmail 3733 invoked by uid 500); 14 Apr 2014 14:44:38 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 3632 invoked by uid 500); 14 Apr 2014 14:44:35 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 3499 invoked by uid 99); 14 Apr 2014 14:44:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 14:44:19 +0000 Date: Mon, 14 Apr 2014 14:44:19 +0000 (UTC) From: "Karthik Kambatla (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-1929) DeadLock in RM when automatic failover is enabled. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1929: ----------------------------------- Attachment: yarn-1929-1.patch Here is a first-cut patch that removes unnecessary synchronization from EmbeddedElectorService, AdminService and CompositeService. Thinking about the best way to write a unit test for this to avoid regressions in the future. We can may be override becomeActive to sleep for some time and try to shut the RM down. If it doesn't shutdown within a particular amount of time, fail the test? Any other ideas? > DeadLock in RM when automatic failover is enabled. > -------------------------------------------------- > > Key: YARN-1929 > URL: https://issues.apache.org/jira/browse/YARN-1929 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: Yarn HA cluster > Reporter: Rohith > Assignee: Karthik Kambatla > Priority: Blocker > Attachments: yarn-1929-1.patch > > > Dead lock detected in RM when automatic failover is enabled. > {noformat} > Found one Java-level deadlock: > ============================= > "Thread-2": > waiting to lock monitor 0x00007fb514303cf0 (object 0x00000000ef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), > which is held by "main-EventThread" > "main-EventThread": > waiting to lock monitor 0x00007fb514750a48 (object 0x00000000ef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), > which is held by "Thread-2" > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)