hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client
Date Fri, 26 Feb 2016 06:31:18 GMT

     [ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naganarasimha G R updated YARN-4711:
------------------------------------
    Description: 
After YARN-3367, while testing the latest 2928 branch came across few NPEs due to which NM
is shutting down.
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher
thread
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
        at java.lang.Thread.run(Thread.java:745)
{code}

{code}
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
        at java.lang.Thread.run(Thread.java:745)
{code}

On analysis found that the there was delay in processing of events, as after YARN-3367 all
the events were getting processed by a single thread inside the timeline client. 

Additionally found one scenario where there is possibility of NPE:
* TimelineEntity.toString() when {{real}} is not null


  was:
While testing the latest 2928 branch came across a NPE which is shutting down the NM
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher
thread
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
        at java.lang.Thread.run(Thread.java:745)
{code}
Seems to be a race condition ...

Few other NPE's faced :
{code}
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
        at java.lang.Thread.run(Thread.java:745)
{code}

* Also there is possibility of NPE in TimelineEntity.toString() when real is not null



> NM is going down with NPE's due to single thread processing of events by Timeline client
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4711
>                 URL: https://issues.apache.org/jira/browse/YARN-4711
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>              Labels: yarn-2928-1st-milestone
>
> After YARN-3367, while testing the latest 2928 branch came across few NPEs due to which
NM is shutting down.
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in
dispatcher thread
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
>         at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> On analysis found that the there was delay in processing of events, as after YARN-3367
all the events were getting processed by a single thread inside the timeline client. 
> Additionally found one scenario where there is possibility of NPE:
> * TimelineEntity.toString() when {{real}} is not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message