From common-issues-return-151855-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon Apr 30 23:50:07 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id DF042180647 for ; Mon, 30 Apr 2018 23:50:06 +0200 (CEST) Received: (qmail 27245 invoked by uid 500); 30 Apr 2018 21:50:05 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 27230 invoked by uid 99); 30 Apr 2018 21:50:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2018 21:50:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 442A4C2E45 for ; Mon, 30 Apr 2018 21:50:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id OaQXHOXBfq-b for ; Mon, 30 Apr 2018 21:50:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5CF415F30C for ; Mon, 30 Apr 2018 21:50:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C2749E0163 for ; Mon, 30 Apr 2018 21:50:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8088C21297 for ; Mon, 30 Apr 2018 21:50:00 +0000 (UTC) Date: Mon, 30 Apr 2018 21:50:00 +0000 (UTC) From: "genericqa (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-15250) Split-DNS MultiHomed Server Network Cluster Network IPC Client Bind Addr Wrong MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-15250?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D16= 459127#comment-16459127 ]=20 genericqa commented on HADOOP-15250: ------------------------------------ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s= {color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0= m 0s{color} | {color:green} The patch does not contain any @author tags. {= color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green}= 0m 0s{color} | {color:green} The patch appears to include 1 new or modif= ied test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}= 23m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27= m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}= 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1= m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:gree= n} 11m 18s{color} | {color:green} branch has no errors when building and te= sting our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} = 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1= m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}= 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26= m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m = 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:oran= ge} 0m 53s{color} | {color:orange} hadoop-common-project/hadoop-common: Th= e patch generated 4 new + 229 unchanged - 0 fixed =3D 233 total (was 229) {= color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0= m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green}= 0m 0s{color} | {color:green} The patch has no whitespace issues. {color}= | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2= s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:gree= n} 9m 6s{color} | {color:green} patch has no errors when building and tes= ting our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} = 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0= m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 1= 0s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green}= 0m 41s{color} | {color:green} The patch does not generate ASF License war= nings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 41s{colo= r} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=3D17.05.0-ce Server=3D17.05.0-ce Image:yetus/hadoop:abb62= dd | | JIRA Issue | HADOOP-15250 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1292129= 0/HADOOP-15250.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsit= e unit shadedclient findbugs checkstyle xml | | uname | Linux f36bc011d84f 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 1= 3:47:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9b09555 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14539/a= rtifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/1453= 9/testReport/ | | Max. process+thread count | 1371 (vs. ulimit of 10000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project= /hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/145= 39/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Split-DNS MultiHomed Server Network Cluster Network IPC Client Bind Addr = Wrong > -------------------------------------------------------------------------= ----- > > Key: HADOOP-15250 > URL: https://issues.apache.org/jira/browse/HADOOP-15250 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc, net > Affects Versions: 2.7.3, 2.9.0, 3.0.0 > Environment: Multihome cluster with split DNS and rDNS lookup of = localhost returning non-routable IPAddr > Reporter: Greg Senia > Priority: Critical > Attachments: HADOOP-15250.00.patch, HADOOP-15250.patch > > > We run=C2=A0our Hadoop clusters with two networks attached to each node. = These network are as follows a server network that is firewalled with firew= alld allowing inbound traffic: only SSH and things like=C2=A0Knox and Hives= erver2 and the=C2=A0HTTP YARN RM/ATS and MR History Server. The second netw= ork is the cluster network on the second network interface this uses Jumbo = frames and is open no restrictions and allows all cluster traffic to flow b= etween nodes.=C2=A0 > =C2=A0 > To resolve DNS within the Hadoop Cluster we use DNS Views via BIND so if = the traffic is originating from nodes with cluster networks we return the i= nternal DNS record for the nodes. This all works fine with all the multi-ho= ming features added to Hadoop 2.x > =C2=A0Some logic around views: > a. The internal view is used by cluster machines when performing lookups.= So hosts on the cluster network should get answers from the internal view = in DNS > b. The external view is used by non-local-cluster machines when performin= g lookups. So hosts not on the cluster network should get answers from the = external view in DNS > =C2=A0 > So this brings me to our problem. We created some firewall rules to allow= inbound traffic from each clusters server network to allow distcp to occur= . But we noticed a problem almost immediately that when YARN attempted to t= alk to the Remote Cluster it was binding outgoing traffic to the cluster ne= twork interface which IS NOT routable. So after researching the code we not= iced the following in NetUtils.java and Client.java=C2=A0 > Basically in Client.java it looks as if it takes whatever the hostname is= and attempts to bind to whatever the hostname is resolved to. This is not = valid in a multi-homed network with one routable interface and one non rout= able interface. After reading through the java.net.Socket documentation it = is valid to perform socket.bind(null) which will allow the OS routing table= and DNS to send the traffic to the correct interface. I will also attach t= he nework traces and a test patch for 2.7.x and 3.x code base. I have this = test fix below in my Hadoop Test Cluster. > Client.java: > =C2=A0=C2=A0 =C2=A0 =C2=A0 > |/*| > |=C2=A0| * Bind the socket to the host specified in the principal name of= the| > |=C2=A0| * client, to ensure Server matching address of the client connec= tion| > |=C2=A0| * to host name in principal passed.| > |=C2=A0| */| > |=C2=A0|InetSocketAddress bindAddr =3D null;| > |=C2=A0|if (ticket !=3D null && ticket.hasKerberosCredentials()) {| > |=C2=A0|KerberosInfo krbInfo =3D| > |=C2=A0|remoteId.getProtocol().getAnnotation(KerberosInfo.class);| > |=C2=A0|if (krbInfo !=3D null) {| > |=C2=A0|String principal =3D ticket.getUserName();| > |=C2=A0|String host =3D SecurityUtil.getHostFromPrincipal(principal);| > |=C2=A0|// If host name is a valid local address then bind socket to it| > |=C2=A0|{color:#FF0000}*InetAddress localAddr =3D NetUtils.getLocalInetAd= dress(host);*{color}| > |{color:#FF0000}=C2=A0**=C2=A0{color}|if (localAddr !=3D null) {| > |=C2=A0|this.socket.setReuseAddress(true);| > |=C2=A0|if (LOG.isDebugEnabled()) {| > |=C2=A0|LOG.debug("Binding " + principal + " to " + localAddr);| > |=C2=A0|}| > |=C2=A0|*{color:#FF0000}bindAddr =3D new InetSocketAddress(localAddr, 0);= {color}*| > |=C2=A0*{color:#FF0000}{color}*=C2=A0|*{color:#FF0000}}{color}*| > |=C2=A0|}| > |=C2=A0|}| > =C2=A0 > So in my Hadoop 2.7.x Cluster I made the following changes and traffic fl= ows correctly out the correct interfaces: > =C2=A0 > diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache= /hadoop/fs/CommonConfigurationKeys.java b/hadoop-common-project/hadoop-comm= on/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java > index e1be271..c5b4a42 100644 > --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop= /fs/CommonConfigurationKeys.java > +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop= /fs/CommonConfigurationKeys.java > @@ -305,6 +305,9 @@ > =C2=A0=C2=A0 public static final String=C2=A0 IPC_CLIENT_FALLBACK_TO_SIMP= LE_AUTH_ALLOWED_KEY =3D "ipc.client.fallback-to-simple-auth-allowed"; > =C2=A0=C2=A0 public static final boolean IPC_CLIENT_FALLBACK_TO_SIMPLE_AU= TH_ALLOWED_DEFAULT =3D false; > =C2=A0 > +=C2=A0 public static final String=C2=A0 IPC_CLIENT_NO_BIND_LOCAL_ADDR_KE= Y =3D "ipc.client.nobind.local.addr"; > +=C2=A0 public static final boolean IPC_CLIENT_NO_BIND_LOCAL_ADDR_DEFAULT= =3D false; > + > =C2=A0=C2=A0 public static final String IPC_CLIENT_CONNECT_MAX_RETRIES_ON= _SASL_KEY =3D > =C2=A0=C2=A0 =C2=A0 "ipc.client.connect.max.retries.on.sasl"; > =C2=A0=C2=A0 public static final int=C2=A0 =C2=A0 IPC_CLIENT_CONNECT_MAX_= RETRIES_ON_SASL_DEFAULT =3D 5; > diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache= /hadoop/ipc/Client.java b/hadoop-common-project/hadoop-common/src/main/java= /org/apache/hadoop/ipc/Client.java > index a6f4eb6..7bfddb7 100644 > --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop= /ipc/Client.java > +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop= /ipc/Client.java > @@ -129,7 +129,9 @@ public static void setCallIdAndRetryCount(int cid, in= t rc) { > =C2=A0 > =C2=A0=C2=A0 private final int connectionTimeout; > =C2=A0 > + > =C2=A0=C2=A0 private final boolean fallbackAllowed; > +=C2=A0 private final boolean noBindLocalAddr; > =C2=A0=C2=A0 private final byte[] clientId; > =C2=A0 =C2=A0 > =C2=A0=C2=A0 final static int CONNECTION_CONTEXT_CALL_ID =3D -3; > @@ -642,7 +644,11 @@ private synchronized void setupConnection() throws I= OException { > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 InetAddress localA= ddr =3D NetUtils.getLocalInetAddress(host); > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (localAddr !=3D= null) { > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 this.socket= .setReuseAddress(true); > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 this.socket.bind= (new InetSocketAddress(localAddr, 0)); > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (noBindLocalA= ddr) { > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 this.sock= et.bind(null); > + } else { > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 this.sock= et.bind(new InetSocketAddress(localAddr, 0)); > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org