hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dapeng Sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9198) Corrupted state from a previous version can still cause RM to fail with NPE on FairScheduler
Date Mon, 14 Jan 2019 14:54:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742132#comment-16742132
] 

Dapeng Sun commented on YARN-9198:
----------------------------------

Most of your options are reasonable for me, it is better to find and fix the underlying issue
around FairScheduler. like the queue issue, config or other reasons which break restoring
of app state.

But in product mode, recovering RM is more important at most time. If RM can't work rightly,
all the works would be blocked, it would be much worse than an application can't be restore.
For users who care about why the application is not restored, they could also check the reason
at log and dig into it. Do you have any ideas?

> Corrupted state from a previous version can still cause RM to fail with NPE on FairScheduler
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-9198
>                 URL: https://issues.apache.org/jira/browse/YARN-9198
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 3.1.0, 2.8.5
>            Reporter: Dapeng Sun
>            Assignee: Dapeng Sun
>            Priority: Major
>         Attachments: YARN-9198.001.patch
>
>
> Previously, RM may fail with NPE due to YARN-4347,YARN-4000. After these fixes, FairScheduler
still has the same potential issue.
>  
> 201x-xx-xx xx:xx:xx,xxx ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart)
- Failed to load/recover state
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message