hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working
Date Wed, 08 Mar 2017 02:42:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900598#comment-15900598
] 

Hadoop QA commented on HBASE-17718:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 3 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s {color}
| {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} |
{color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} |
{color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color}
| {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color}
| {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 43s {color} | {color:red}
hbase-server in branch-1 has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} |
{color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} |
{color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} |
{color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 14m 26s {color}
| {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2
2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 15s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} |
{color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 38s {color} | {color:green}
hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 113m 27s {color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:e01ee2f |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12856710/HBASE-17718.branch-1.002.patch
|
| JIRA Issue | HBASE-17718 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux afa629beddc0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1 / 5f63093 |
| Default Java | 1.7.0_80 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-oracle:1.7.0_80
|
| findbugs | v3.0.0 |
| findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/5993/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/5993/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/5993/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Difference between RS's servername and its ephemeral node cause SSH stop working
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-17718
>                 URL: https://issues.apache.org/jira/browse/HBASE-17718
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.2.4, 1.1.8
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>         Attachments: HBASE-17718.branch-1.001.patch, HBASE-17718.branch-1.002.patch,
HBASE-17718.master.001.patch, HBASE-17718.master.002.patch, HBASE-17718.master.003.patch
>
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for duty. But if
the hosts config (/etc/hosts) is different between master and RS, RS's serverName can be different
from the one stored the ephemeral zk node. The email metioned in HBASE-13753 (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g@mail.gmail.com%3E)
is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName in RS and
zk node can cause SSH stop to work. as we can see from the code in {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
>     if (path.startsWith(watcher.rsZNode)) {
>       String serverName = ZKUtil.getNodeName(path);
>       LOG.info("RegionServer ephemeral node deleted, processing expiration [" +
>         serverName + "]");
>       ServerName sn = ServerName.parseServerName(serverName);
>       if (!serverManager.isServerOnline(sn)) {
>         LOG.warn(serverName.toString() + " is not online or isn't known to the master."+
>          "The latter could be caused by a DNS misconfiguration.");
>         return;
>       }
>       remove(sn);
>       this.serverManager.expireServer(sn);
>     }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on this server
will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed before it can
put up a zk node. It is a very rare case(And controllableļ¼Œ just fix the bug making rs to
crash). But The issue I metioned can happened more often(and uncontrollable, can't be fixed
in HBase, due to DNS, hosts config, etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message