hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dapeng Sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9198) Corrupted state from a previous version can still cause RM to fail with NPE on FairScheduler
Date Mon, 14 Jan 2019 13:16:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742049#comment-16742049
] 

Dapeng Sun commented on YARN-9198:
----------------------------------

Hi [~wilfreds], thank you for your comments :)
The exception reported here is also thrown by [FairScheduler.java#L494|https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494]
 as YARN-7913 mentioned. It happened on RM failover, and RM can not change the state from
standby to active due to NPE before I reformat the state. I just pick up how Capacity Scheduler
([CapacityScheduler.java#L875|https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L875])
handle this kind of exception for quick fix.

> Corrupted state from a previous version can still cause RM to fail with NPE on FairScheduler
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-9198
>                 URL: https://issues.apache.org/jira/browse/YARN-9198
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 3.1.0, 2.8.5
>            Reporter: Dapeng Sun
>            Assignee: Dapeng Sun
>            Priority: Major
>         Attachments: YARN-9198.001.patch
>
>
> Previously, RM may fail with NPE due to YARN-4347,YARN-4000. After these fixes, FairScheduler
still has the same potential issue.
>  
> 201x-xx-xx xx:xx:xx,xxx ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart)
- Failed to load/recover state
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message