hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "genericqa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15250) Split-DNS MultiHomed Server Network Cluster Network IPC Client Bind Addr Wrong
Date Mon, 30 Apr 2018 21:50:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459127#comment-16459127
] 

genericqa commented on HADOOP-15250:
------------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 25s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 59s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 43s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 44s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  5s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 34s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  3s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 45s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 52s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 52s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 53s{color}
| {color:orange} hadoop-common-project/hadoop-common: The patch generated 4 new + 229 unchanged
- 0 fixed = 233 total (was 229) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 57s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  2s{color} | {color:green}
The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  9m  6s{color}
| {color:green} patch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 39s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 56s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 10s{color} | {color:green}
hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 41s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m 41s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15250 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921290/HADOOP-15250.00.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient
 findbugs  checkstyle  xml  |
| uname | Linux f36bc011d84f 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9b09555 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14539/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14539/testReport/ |
| Max. process+thread count | 1371 (vs. ulimit of 10000) |
| modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
|
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14539/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Split-DNS MultiHomed Server Network Cluster Network IPC Client Bind Addr Wrong
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15250
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15250
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc, net
>    Affects Versions: 2.7.3, 2.9.0, 3.0.0
>         Environment: Multihome cluster with split DNS and rDNS lookup of localhost returning
non-routable IPAddr
>            Reporter: Greg Senia
>            Priority: Critical
>         Attachments: HADOOP-15250.00.patch, HADOOP-15250.patch
>
>
> We run our Hadoop clusters with two networks attached to each node. These network are
as follows a server network that is firewalled with firewalld allowing inbound traffic: only
SSH and things like Knox and Hiveserver2 and the HTTP YARN RM/ATS and MR History Server.
The second network is the cluster network on the second network interface this uses Jumbo
frames and is open no restrictions and allows all cluster traffic to flow between nodes. 
>  
> To resolve DNS within the Hadoop Cluster we use DNS Views via BIND so if the traffic
is originating from nodes with cluster networks we return the internal DNS record for the
nodes. This all works fine with all the multi-homing features added to Hadoop 2.x
>  Some logic around views:
> a. The internal view is used by cluster machines when performing lookups. So hosts on
the cluster network should get answers from the internal view in DNS
> b. The external view is used by non-local-cluster machines when performing lookups. So
hosts not on the cluster network should get answers from the external view in DNS
>  
> So this brings me to our problem. We created some firewall rules to allow inbound traffic
from each clusters server network to allow distcp to occur. But we noticed a problem almost
immediately that when YARN attempted to talk to the Remote Cluster it was binding outgoing
traffic to the cluster network interface which IS NOT routable. So after researching the code
we noticed the following in NetUtils.java and Client.java 
> Basically in Client.java it looks as if it takes whatever the hostname is and attempts
to bind to whatever the hostname is resolved to. This is not valid in a multi-homed network
with one routable interface and one non routable interface. After reading through the java.net.Socket
documentation it is valid to perform socket.bind(null) which will allow the OS routing table
and DNS to send the traffic to the correct interface. I will also attach the nework traces
and a test patch for 2.7.x and 3.x code base. I have this test fix below in my Hadoop Test
Cluster.
> Client.java:
>       
> |/*|
> | | * Bind the socket to the host specified in the principal name of the|
> | | * client, to ensure Server matching address of the client connection|
> | | * to host name in principal passed.|
> | | */|
> | |InetSocketAddress bindAddr = null;|
> | |if (ticket != null && ticket.hasKerberosCredentials()) {|
> | |KerberosInfo krbInfo =|
> | |remoteId.getProtocol().getAnnotation(KerberosInfo.class);|
> | |if (krbInfo != null) {|
> | |String principal = ticket.getUserName();|
> | |String host = SecurityUtil.getHostFromPrincipal(principal);|
> | |// If host name is a valid local address then bind socket to it|
> | |{color:#FF0000}*InetAddress localAddr = NetUtils.getLocalInetAddress(host);*{color}|
> |{color:#FF0000} ** {color}|if (localAddr != null) {|
> | |this.socket.setReuseAddress(true);|
> | |if (LOG.isDebugEnabled()) {|
> | |LOG.debug("Binding " + principal + " to " + localAddr);|
> | |}|
> | |*{color:#FF0000}bindAddr = new InetSocketAddress(localAddr, 0);{color}*|
> | *{color:#FF0000}{color}* |*{color:#FF0000}}{color}*|
> | |}|
> | |}|
>  
> So in my Hadoop 2.7.x Cluster I made the following changes and traffic flows correctly
out the correct interfaces:
>  
> diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> index e1be271..c5b4a42 100644
> --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> @@ -305,6 +305,9 @@
>    public static final String  IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_KEY = "ipc.client.fallback-to-simple-auth-allowed";
>    public static final boolean IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_DEFAULT =
false;
>  
> +  public static final String  IPC_CLIENT_NO_BIND_LOCAL_ADDR_KEY = "ipc.client.nobind.local.addr";
> +  public static final boolean IPC_CLIENT_NO_BIND_LOCAL_ADDR_DEFAULT = false;
> +
>    public static final String IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_KEY =
>      "ipc.client.connect.max.retries.on.sasl";
>    public static final int    IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_DEFAULT = 5;
> diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> index a6f4eb6..7bfddb7 100644
> --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> @@ -129,7 +129,9 @@ public static void setCallIdAndRetryCount(int cid, int rc) {
>  
>    private final int connectionTimeout;
>  
> +
>    private final boolean fallbackAllowed;
> +  private final boolean noBindLocalAddr;
>    private final byte[] clientId;
>    
>    final static int CONNECTION_CONTEXT_CALL_ID = -3;
> @@ -642,7 +644,11 @@ private synchronized void setupConnection() throws IOException {
>                InetAddress localAddr = NetUtils.getLocalInetAddress(host);
>                if (localAddr != null) {
>                  this.socket.setReuseAddress(true);
> -                this.socket.bind(new InetSocketAddress(localAddr, 0));
> +                if (noBindLocalAddr) {
> +                  this.socket.bind(null);
> + } else {
> +                  this.socket.bind(new InetSocketAddress(localAddr, 0));
> +                }
>                }
>              }
>            }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message