hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node
Date Tue, 06 Oct 2015 07:04:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944602#comment-14944602
] 

Hadoop QA commented on YARN-4227:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 20s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear to include any
new or modified tests.  Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc warning messages.
|
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 1 release
audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 new checkstyle
issues (total was 71, now 72). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that end in whitespace.
|
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with eclipse:eclipse.
|
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce any new Findbugs
(version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  55m 56s | Tests passed in hadoop-yarn-server-resourcemanager.
|
| | |  96m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12765128/YARN-4227.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5f6edb3 |
| Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/patchReleaseAuditProblems.txt
|
| checkstyle |  https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
|
| hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9357/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9357/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9357/console |


This message was automatically generated.

> FairScheduler: RM quits processing expired container from a removed node
> ------------------------------------------------------------------------
>
>                 Key: YARN-4227
>                 URL: https://issues.apache.org/jira/browse/YARN-4227
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.3.0, 2.5.0, 2.7.1
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Critical
>         Attachments: YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container event is processed
causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1436927988321_1307950_01_000012
Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1436927988321_1307950_01_000012 Container Transitioned from ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp:
Completed container: container_1436927988321_1307950_01_000012 in state: EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=system_op	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1436927988321_1307950
CONTAINERID=container_1436927988321_1307950_01_000012
> 2015-10-04 21:14:01,063 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Error in handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 and 2.6.0
by different customers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message