hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2823) NullPointerException in RM HA enabled 3-node cluster
Date Sat, 08 Nov 2014 14:01:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203434#comment-14203434
] 

Hudson commented on YARN-2823:
------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk #1927 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1927/])
YARN-2823. Fixed ResourceManager app-attempt state machine to inform schedulers about previous
finished attempts of a running appliation to avoid expectation mismatch w.r.t transferred
containers. Contributed by Jian He. (vinodkv: rev a5657182a7accebe08cd86e46b4cdeb163d4d1f2)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> NullPointerException in RM HA enabled 3-node cluster
> ----------------------------------------------------
>
>                 Key: YARN-2823
>                 URL: https://issues.apache.org/jira/browse/YARN-2823
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Gour Saha
>            Assignee: Jian He
>            Priority: Critical
>             Fix For: 2.6.0
>
>         Attachments: YARN-2823.1.patch, logs_with_NPE_in_RM.zip
>
>
> Branch:
> 2.6.0
> Environment: 
> A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and
then installed HBase using Slider. After some time the RMs went down and would not come back
up anymore. Following is the NPE we see in both the RM logs.
> {noformat}
> 2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612))
- Error in handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-09-16 01:36:28,042 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(616))
- Exiting, bbye..
> {noformat}
> All the logs for this 3-node cluster has been uploaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message