hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5131) Distributed shell AM fails because of InterruptedException
Date Mon, 23 May 2016 22:28:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297244#comment-15297244
] 

Wangda Tan commented on YARN-5131:
----------------------------------

[~hitesh], yes you're correct, InterruptedException will not cause AM failure. Updating title
and desc.

The root cause of this issue is because of YARN-1902, YARN scheduler could allocate more container
than required to AM. When AM is finishing when extra container arrives, container launch will
fail because NMClient thread is interrupted, which causes following check fails:
{code}
    if (numFailedContainers.get() == 0 &&
        numCompletedContainers.get() == numTotalContainers) {
        // SUCCESSFUL
    }
{code}

Instead we should deduct failed container from completed containers, uploading patch.


> Distributed shell AM fails because of InterruptedException
> ----------------------------------------------------------
>
>                 Key: YARN-5131
>                 URL: https://issues.apache.org/jira/browse/YARN-5131
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>
> DShell AM fails with the following exception
> {code}
> INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
> java.lang.InterruptedException
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message