hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures
Date Wed, 10 May 2017 21:47:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005507#comment-16005507
] 

Hadoop QA commented on YARN-6409:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 17s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 25s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 35s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 28s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 37s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 19s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  4s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 21s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 34s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 33s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 33s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 26s{color}
| {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
The patch generated 3 new + 156 unchanged - 0 fixed = 159 total (was 156) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 35s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 15s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  7s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 19s{color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 20s{color} | {color:red}
hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 21s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m  1s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6409 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12867418/YARN-6409.02.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux b154299c946a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ad1e3e4 |
| Default Java | 1.8.0_121 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15898/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
|
| unit | https://builds.apache.org/job/PreCommit-YARN-Build/15898/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15898/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15898/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RM does not blacklist node for AM launch failures
> -------------------------------------------------
>
>                 Key: YARN-6409
>                 URL: https://issues.apache.org/jira/browse/YARN-6409
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: YARN-6409.00.patch, YARN-6409.01.patch, YARN-6409.02.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that happen after
AM container is launched (see RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However,
AM launch can also fail if the NM, where the AM container is allocated, goes unresponsive.
 Because it is not handled, scheduler may continue to allocate AM containers on that same
NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error launching appattempt_1478721503753_0870_000002.
Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host Details : local host
is: "*.me.com/17.111.179.113"; destination host is: "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)

> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)

> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

> at java.lang.reflect.Method.invoke(Method.java:497) 
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)

> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)

> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)

> at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)

> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)

> at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)

> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702
remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 
> at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367) 
> at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560) 
> at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375) 
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730) 
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)

> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725) 
> ... 18 more 
> . Failing the application.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message