hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12622) RetryPolicies (other than FailoverOnNetworkExceptionRetry) should put on retry failed reason or the log from RMProxy's retry could be very misleading.
Date Fri, 29 Jan 2016 00:23:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122600#comment-15122600
] 

Hadoop QA commented on HADOOP-12622:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 9s {color} | {color:green}
trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s {color} |
{color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s {color} | {color:green}
trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s {color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} |
{color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green}
trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 58s {color} |
{color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 58s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s {color} |
{color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color}
| {color:green} hadoop-common-project/hadoop-common: patch generated 0 new + 44 unchanged
- 2 fixed = 44 total (was 46) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} |
{color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green}
the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 23s {color} | {color:red}
hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 41s {color} | {color:green}
hadoop-common in the patch passed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color}
| {color:green} Patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 44s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.http.TestHttpServerLifecycle |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12781600/HADOOP-12622-v4.patch
|
| JIRA Issue | HADOOP-12622 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 174702c6e90c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7f46636 |
| Default Java | 1.7.0_91 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
|
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/8491/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HADOOP-Build/8491/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt
|
| JDK v1.7.0_91  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/8491/testReport/
|
| modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
|
| Max memory used | 76MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/8491/console |


This message was automatically generated.



> RetryPolicies (other than FailoverOnNetworkExceptionRetry) should put on retry failed
reason or the log from RMProxy's retry could be very misleading.
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12622
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: auto-failover
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-12622-v2.patch, HADOOP-12622-v3.1.patch, HADOOP-12622-v3.patch,
HADOOP-12622-v4.patch, HADOOP-12622.patch
>
>
> In debugging a NM retry connection to RM (non-HA), the NM log during RM down time is
very misleading:
> {noformat}
> 2015-12-07 11:37:14,098 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:15,099 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:16,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:17,103 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:18,105 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:19,107 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:20,109 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:21,112 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:22,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:23,115 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:54,120 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:55,121 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:56,123 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:57,125 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:58,126 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:37:59,128 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-12-07 11:38:00,130 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> {noformat}
> It actually only log client side retry on NetworkConnection failure but not include any
info on RetryInvocationHandler where the real retry policy works. From the code below in RetryInvocationHandler.java,
even the retry ends, we don't put warn messages to include how much/many time/ counts we spent
on retry logic that make it harder to debug.
> {code}
>         if (failAction != null) {
>           if (failAction.reason != null) {
>             LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()
>                 + "." + method.getName() + " over " + currentProxy.proxyInfo
>                 + ". Not retrying because " + failAction.reason, ex);
>           }
>           throw ex;
>         }
> {code}
> We should add failAction.reason as much as we can in multiple retry policies. In addition,
we should keep consistent in log level for message during the retry attempts: now the ipc.client
is INFO, but RetryInvocationHandler is DEBUG (if not fail_over). We should keep them consistent
or it could be very confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message