hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
Date Thu, 05 Feb 2015 15:24:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307381#comment-14307381

Jason Lowe commented on YARN-3143:

bq. Could you provide the RM logs, please ? That will help debug.

Here's what the RM log says when the NPE occurs with a finalStatus query:

2015-02-05 15:18:09,744 [1124535424@qtp-165859665-85345] WARN webapp.GenericExceptionHandler:

That's all there is.  No stacktrace or anything else.  Nothing else in the logs looks out
of place around the time of the call.  We also saw nothing of note in the logs when the web
services returned apps with missing fields, which aligns with what I'm pretty confident is
happening.  The RM is removing applications from the RMApps map just as the web services are
trying to walk it.  Given how expensive it is to grab all the scheduler lock for all those
applications on this busy cluster, I'm not surprised that by the time the web services receives
the full list of application reports at least one of the apps has retired from the RMApps

> RM Apps REST API can return NPE or entries missing id and other fields
> ----------------------------------------------------------------------
>                 Key: YARN-3143
>                 URL: https://issues.apache.org/jira/browse/YARN-3143
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 2.5.2
>            Reporter: Kendall Thrapp
>            Assignee: Jason Lowe
>         Attachments: YARN-3143.001.patch
> I'm seeing intermittent null pointer exceptions being returned by
> the YARN Apps REST API.
> For example:
> {code}
> http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
> {code}
> JSON Response was:
> {code}
> {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}
> {code}
> At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED).
> Possibly related, when I do get back a list of apps, sometimes one or more of the apps
will be missing most of the fields, like id, name, user, etc., and the fields that are present
all have zero for the value.  
> For example:
> {code}
> {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}
> {code}
> Let me know if there's any other information I can provide to help debug.

This message was sent by Atlassian JIRA

View raw message