hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4747) AHS error 500 due to NPE when container start event is missing
Date Mon, 29 Feb 2016 21:48:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172671#comment-15172671

Jason Lowe commented on YARN-4747:

I believe this was triggered by a missing container start event for a given container finish
event.  When an application runs for a long time there will be a corresponding long window
between the container start event and container finish event for the AM container.  The timelineserver
performs retention based on entity timestamp, so there will be a long window where the container
start event has been deleted but the container finish event is still present.  The application
history code is not prepared to handle that, as only the container start event has the node
hostname and port number information.  It blindly assumes that if a container entity is present
in the database then we know both the host and the port.

Minimally the application history server needs to be hardened to avoid the NPE, but we may
want to add the host and port information to the finish event as well to allow the history
page to continue to provide logs as long as there is either a container start or container
finish event in the database.

> AHS error 500 due to NPE when container start event is missing
> --------------------------------------------------------------
>                 Key: YARN-4747
>                 URL: https://issues.apache.org/jira/browse/YARN-4747
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
> Saw an error 500 due to a NullPointerException caused by a missing host for an AM container.
 Stacktrace to follow.

This message was sent by Atlassian JIRA

View raw message