hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mit Desai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization
Date Wed, 06 Aug 2014 19:08:12 GMT
Mit Desai created YARN-2387:
-------------------------------

             Summary: Resource Manager crashes with NPE due to lack of synchronization
                 Key: YARN-2387
                 URL: https://issues.apache.org/jira/browse/YARN-2387
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.0.0, 2.5.0
            Reporter: Mit Desai
            Assignee: Mit Desai


We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace for it.

{noformat}
2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
        at
org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
        at java.lang.String.valueOf(String.java:2854)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
        at java.lang.String.valueOf(String.java:2854)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
        at java.lang.Thread.run(Thread.java:722)
2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}

On investigating a on the issue we found that the ContainerStatusPBImpl has methods that are
called by different threads and are not synchronized. Even the 2.X code looks alike.

We need to make these methods synchronized so that we do not encounter this problem in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message