hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart
Date Thu, 09 Jul 2015 21:17:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621281#comment-14621281

Eric Payne commented on YARN-3905:

{{org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable}} constructs what
it believes should be the AM container ID when creating a new {{GetContainerReportRequest}}.
        // AM container is always the first container of the attempt
        final GetContainerReportRequest request =
              appAttemptReport.getApplicationAttemptId(), 1));
- After the RM is restarted, container IDs contain an {{e##}} string, which the above code
doesn't take into consideration
- The AM container is not always _000001 due to the way reservations work. We have seen "non-first"
AM containers in practice.

As a result of the above code, the container ID in the {{GetContainerReportRequest}} may not
match the actual AM container ID before RM restart, and will not match those for jobs run
after the RM is restarted.

So, When {{ApplicationHistoryManagerImpl}} compares the ID of the passed container with it's
cache from the history store, it can't find a match and throws the NPE.

In {{AppBlock#generateApplicationTable}}, instead of constructing the AM's container ID, I
suggest using appAttemptReport#getAMContainerId instead:
        final GetContainerReportRequest request =

> Application History Server UI NPEs when accessing apps run after RM restart
> ---------------------------------------------------------------------------
>                 Key: YARN-3905
>                 URL: https://issues.apache.org/jira/browse/YARN-3905
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1
>            Reporter: Eric Payne
>            Assignee: Eric Payne
> From the Application History URL (http://RmHostName:8188/applicationhistory), clicking
on the application ID of an app that was run after the RM daemon has been restarted results
in a 500 error:
> {noformat}
> Sorry, got error 500
> Please consult RFC 2616 for meanings of the error code.
> {noformat}
> The stack trace is as follows:
> {code}
> 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore:
Completed reading history information of all application attempts of application application_1436472584878_0001
> 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to
read the AM container of the application attempt appattempt_1436472584878_0001_000001.
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
>         at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
>         at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
>         at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
> ...
> {code}

This message was sent by Atlassian JIRA

View raw message