hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used
Date Fri, 17 Jul 2015 06:28:04 GMT
Ming Ma created YARN-3934:

             Summary: Application with large ApplicationSubmissionContext can cause RM to
exit when ZK store is used
                 Key: YARN-3934
                 URL: https://issues.apache.org/jira/browse/YARN-3934
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Ming Ma

Use the following steps to test.

1. Set up ZK as the RM HA store.
2. Submit a job that refers to lots of distributed cache files with long HDFS path, which
will cause the app state size to exceed ZK's max object size limit.
3. RM can't write to ZK and exit with the following exception.

2015-07-10 22:21:13,002 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)

In this case, RM could have rejected the app during submitApplication RPC if the size of ApplicationSubmissionContext
is too large.

This message was sent by Atlassian JIRA

View raw message