Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AEF00188B4 for ; Tue, 8 Dec 2015 05:16:11 +0000 (UTC) Received: (qmail 6049 invoked by uid 500); 8 Dec 2015 05:16:11 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 5998 invoked by uid 500); 8 Dec 2015 05:16:11 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 5962 invoked by uid 99); 8 Dec 2015 05:16:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Dec 2015 05:16:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 23A612C1F62 for ; Tue, 8 Dec 2015 05:16:11 +0000 (UTC) Date: Tue, 8 Dec 2015 05:16:11 +0000 (UTC) From: "Rohith Sharma K S (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4424) YARN CLI command hangs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046374#comment-15046374 ] Rohith Sharma K S commented on YARN-4424: ----------------------------------------- Sorry if I am not clear in previous comment. Correct me if my understanding is wrong. >From the 3 Threads trace, I see that all 3 threads are BLOCKED for holding RMAppImpl lock. {{Thread-1}} and {{Thread-2}} is for *ReadLock* & {{Thread-3}} is for *WriteLock*. Thread-1 {quote} Thread 53785: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame) - java.util.concurrent.locks.*ReentrantReadWriteLock$ReadLock.lock()* @bci=5, line=731 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.*RMAppImpl.getFinalApplicationStatus()* @bci=4, line=478 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished {quote} Thread-2 {quote} Thread 25723: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Compiled frame) - java.util.concurrent.locks.*ReentrantReadWriteLock$ReadLock.lock()* @bci=5, line=731 (Compiled frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.*RMAppImpl.createAndGetApplicationReport*(java.lang.String, boolean) @bci=4, line=598 (Interpreted frame) {quote} Thread-3 {quote} Thread 53696: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame) - java.util.concurrent.locks.*ReentrantReadWriteLock$WriteLock.lock()* @bci=5, line=945 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.*RMAppImpl.pullRMNodeUpdates*(java.util.Collection) @bci=4, line=584 (Interpreted frame) {quote} Thread-2 is NOT holding read lock of RMAppImpl, but it is blocked for read lock. So my doubt is which thread flow is holding lock of RMAppImpl? > YARN CLI command hangs > ---------------------- > > Key: YARN-4424 > URL: https://issues.apache.org/jira/browse/YARN-4424 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Yesha Vora > Assignee: Jian He > Priority: Blocker > Attachments: YARN-4424.1.patch > > > {code} > yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING > 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: http://XXX:8188/ws/v1/timeline/ > 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at XXX/0.0.0.0:8050 > 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History server at XXX/0.0.0.0:10200 > {code} > {code:title=RM log} > 2015-12-04 21:59:19,744 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000 > 2015-12-04 22:00:50,945 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000 > 2015-12-04 22:02:22,416 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000 > 2015-12-04 22:03:53,593 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 240000 > 2015-12-04 22:05:24,856 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000 > 2015-12-04 22:06:56,235 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000 > 2015-12-04 22:08:27,510 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000 > 2015-12-04 22:09:58,786 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)