hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart
Date Tue, 22 Sep 2015 20:37:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903387#comment-14903387

Varun Saxena commented on YARN-4000:


bq. actually, I think this will be a problem in regular case. Application is being killed
by user right on RM restart. This is an existing problem though. Do you think so ?
You mean user killing the application and we killing the application too at the same time
? But RM will first do the recovery and then only open any of the ports while transitioning
to active. So ClientRMService or ResourceTrackerService wont even start till recovery is done.
So most probably by the time kill from user comes, all the recovery related events should
be processed. Even if they are not processed, they will be ahead in the dispatcher queue.
A KILL event if app is already KILLING would be ignored by RMAppImpl.

> RM crashes with NPE if leaf queue becomes parent queue during restart
> ---------------------------------------------------------------------
>                 Key: YARN-4000
>                 URL: https://issues.apache.org/jira/browse/YARN-4000
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-4000.01.patch, YARN-4000.02.patch, YARN-4000.03.patch, YARN-4000.04.patch,
> This is a similar situation to YARN-2308.  If an application is active in queue A and
then the RM restarts with a changed capacity scheduler configuration where queue A becomes
a parent queue to other subqueues then the RM will crash with a NullPointerException.

This message was sent by Atlassian JIRA

View raw message