hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gu-chi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover
Date Thu, 23 Apr 2015 11:26:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508855#comment-14508855
] 

gu-chi commented on YARN-3536:
------------------------------

2015-04-21 04:22:33,923 | INFO  | main-EventThread | Recovering app: application_1429597538411_0001
with 2 attempts and final state = FINISHED | org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:700)
2015-04-21 04:22:33,923 | INFO  | main-EventThread | Recovering attempt: appattempt_1429597538411_0001_000001
with final state: FAILED | org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:734)
2015-04-21 04:22:33,924 | INFO  | main-EventThread | Recovering attempt: appattempt_1429597538411_0001_000002
with final state: null | org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:734)
2015-04-21 04:22:33,924 | INFO  | main-EventThread | Create AMRMToken for ApplicationAttempt:
appattempt_1429597538411_0001_000002 | org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.createAndGetAMRMToken(AMRMTokenSecretManager.java:195)
2015-04-21 04:22:33,924 | INFO  | main-EventThread | Creating password for appattempt_1429597538411_0001_000002
| org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.createPassword(AMRMTokenSecretManager.java:307)
2015-04-21 04:22:33,924 | INFO  | main-EventThread | appattempt_1429597538411_0001_000001
State change from NEW to FAILED | org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:704)
2015-04-21 04:22:33,925 | INFO  | main-EventThread | Registering app attempt : appattempt_1429597538411_0001_000002
| org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerAppAttempt(ApplicationMasterService.java:656)
2015-04-21 04:22:33,925 | ERROR | main-EventThread | Failed to load/recover state | org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:533)
java.lang.NullPointerException
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:607)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:941)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:97)

> ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3536
>                 URL: https://issues.apache.org/jira/browse/YARN-3536
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>    Affects Versions: 2.4.1
>            Reporter: gu-chi
>
> Here is a scenario that Application status is FAILED/FINISHED but AppAttempt status is
null, this cause NPE when doing recover with yarn.resourcemanager.work-preserving-recovery.enabled
set to true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message