hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6847) NPE in RM while starting timeline collector on recovery after explicit failover
Date Wed, 19 Jul 2017 22:25:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093898#comment-16093898
] 

Varun Saxena commented on YARN-6847:
------------------------------------

Found this issue while writing tests for YARN-6130.
NPE is because RMTimelineCollectorManager in RMContext being null.

This is because RMTimelineCollectorManager instance is set in active service context inside
RMContextImpl but the object for it is created inside ResourceManager#serviceInit. This means
if RM is made to transition to standby, active service context will be reset(created again)
and RMTimelineCollectorManager object will never be set in it.

This means that when RM subsequently becomes active, during recovery if a timeline collector
for a recovered app is to be started, that would fail due to a NPE.

> NPE in RM while starting timeline collector on recovery after explicit failover
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6847
>                 URL: https://issues.apache.org/jira/browse/YARN-6847
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Varun Saxena
>
> {noformat}
> 2017-07-20 03:20:50,742 ERROR [Thread-449] resourcemanager.ResourceManager (ResourceManager.java:serviceStart(763))
- Failed to load/recover state
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.startTimelineCollector(RMAppImpl.java:535)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:467)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:336)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:576)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1419)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:758)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1178)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1218)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1214)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1214)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:319)
>         at org.apache.hadoop.yarn.client.ProtocolHATestBase.explicitFailover(ProtocolHATestBase.java:205)
>         at org.apache.hadoop.yarn.client.ProtocolHATestBase$1.run(ProtocolHATestBase.java:250)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message