hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3809) Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
Date Tue, 16 Jun 2015 16:34:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588334#comment-14588334
] 

Hadoop QA commented on YARN-3809:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 31s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear to include any
new or modified tests.  Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc warning messages.
|
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does not increase
the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 21s | The applied patch generated  1 new checkstyle
issues (total was 213, now 213). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that end in whitespace.
|
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with eclipse:eclipse.
|
| {color:green}+1{color} | findbugs |   2m 54s | The patch does not introduce any new Findbugs
(version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  50m 49s | Tests failed in hadoop-yarn-server-resourcemanager.
|
| | |  93m  4s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12739875/YARN-3809.01.patch
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b039e69 |
| checkstyle |  https://builds.apache.org/job/PreCommit-YARN-Build/8261/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
|
| hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8261/artifact/patchprocess/testrun_hadoop-yarn-api.txt
|
| hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8261/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8261/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8261/console |


This message was automatically generated.

> Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-3809
>                 URL: https://issues.apache.org/jira/browse/YARN-3809
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-3809.01.patch
>
>
> ApplicationMasterLauncher create a thread pool whose size is 10 to deal with AMLauncherEventType(LAUNCH
and CLEANUP).
> In our cluster, there was many NM with 10+ AM running on it, and one shut down for some
reason. After RM found the NM LOST, it cleaned up AMs running on it. Then ApplicationMasterLauncher
need handle these 10+ CLEANUP event. ApplicationMasterLauncher's thread pool would be filled
up, and they all hang in the code containerMgrProxy.stopContainers(stopRequest) because NM
was down, the default RPC time out is 15 mins. It means that in 15 mins ApplicationMasterLauncher
could not handle new event such as LAUNCH, then new attempts will fails to launch because
of time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message