hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dustin Cote (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used
Date Fri, 28 Oct 2016 02:10:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614013#comment-15614013

Dustin Cote commented on YARN-3934:

[~templedf] feel free to take it over if you'd like.  I'm not going to have a chance to address
this for the foreseeable future.

> Application with large ApplicationSubmissionContext can cause RM to exit when ZK store
is used
> ----------------------------------------------------------------------------------------------
>                 Key: YARN-3934
>                 URL: https://issues.apache.org/jira/browse/YARN-3934
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Ming Ma
>            Assignee: Dustin Cote
>              Labels: oct16-easy
>         Attachments: YARN-3934-1.patch
> Use the following steps to test.
> 1. Set up ZK as the RM HA store.
> 2. Submit a job that refers to lots of distributed cache files with long HDFS path, which
will cause the app state size to exceed ZK's max object size limit.
> 3. RM can't write to ZK and exit with the following exception.
> {noformat}
> 2015-07-10 22:21:13,002 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
> {noformat}
> In this case, RM could have rejected the app during submitApplication RPC if the size
of ApplicationSubmissionContext is too large.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message