hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
Date Tue, 19 May 2015 09:15:02 GMT

    [ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550092#comment-14550092
] 

Hadoop QA commented on YARN-3646:
---------------------------------

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 43s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to include 3 new
or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc warning messages.
|
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does not increase
the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  1s | There were no new checkstyle issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that end in whitespace.
|
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with eclipse:eclipse.
|
| {color:green}+1{color} | findbugs |   3m  2s | The patch does not introduce any new Findbugs
(version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 17s | Tests passed in hadoop-common. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in hadoop-yarn-common. |
| | |  63m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 93972a3 |
| hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt
|
| hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7994/console |


This message was automatically generated.

> Applications are getting stuck some times in case of retry policy forever
> -------------------------------------------------------------------------
>
>                 Key: YARN-3646
>                 URL: https://issues.apache.org/jira/browse/YARN-3646
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Raju Bairishetti
>         Attachments: YARN-3646.patch
>
>
> We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying
policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException),
even though it is not a connection failure. Due to this my application is not progressing
further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application report for
an invalid  or older appId. ResourceManager is throwing an ApplicationNotFoundException as
this is an invalid or older appId.  But because of retry policy FOREVER, client is keep on
retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException
continuously.
> {code}
> private void testYarnClientRetryPolicy() throws  Exception{
>         YarnConfiguration conf = new YarnConfiguration();
>         conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1);
>         YarnClient yarnClient = YarnClient.createYarnClient();
>         yarnClient.init(conf);
>         yarnClient.start();
>         ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645);
>         ApplicationReport report = yarnClient.getApplicationReport(appId);
>     }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645'
doesn't exist in RM.
> 	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> 	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> ....
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.14.120.231:61621 Call#875163 Retry#0
> ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message