Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 585A6102B8 for ; Sun, 15 Dec 2013 21:02:15 +0000 (UTC) Received: (qmail 70658 invoked by uid 500); 15 Dec 2013 21:02:10 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70489 invoked by uid 500); 15 Dec 2013 21:02:10 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70482 invoked by uid 99); 15 Dec 2013 21:02:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Dec 2013 21:02:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.8.162.187] (HELO sdc-mx2-out.umd.edu) (128.8.162.187) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Dec 2013 21:02:04 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArsEAEcYrlKACBWx/2dsb2JhbABZgkZ8VbhkgS90giUBAQEELRpCAgEIEQQBAQsdByERFAkIAQEEEwgTh1UDEbN3jEUNV4YqF40GgWI3AYMjgRMEhVeQVIMbiyqFOoFsgT6CKg X-IronPort-AV: E=Sophos;i="4.95,490,1384318800"; d="scan'208,217";a="197849509" Received: from oitmx1006.umd.edu (HELO exch.mail.umd.edu) ([128.8.21.177]) by sdc-mx2-relay.umd.edu with ESMTP; 15 Dec 2013 16:01:43 -0500 Received: from OITMX1001.AD.UMD.EDU ([169.254.1.218]) by OITMX1006.AD.UMD.EDU ([169.254.6.13]) with mapi id 14.03.0158.001; Sun, 15 Dec 2013 16:01:42 -0500 From: Jeff Stuckman To: "user@hadoop.apache.org" Subject: RE: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container Thread-Topic: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container Thread-Index: Ac75TUeKgTRIdq30TJGVY2E4p2vnTAAtO2uAAApfRfA= Date: Sun, 15 Dec 2013 21:01:42 +0000 Message-ID: <0981BC370720B14DAB7D7B0CC70B16CF68EC83F8@OITMX1001.AD.UMD.EDU> References: <0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU> <52AE17C7.7010101@gmail.com> In-Reply-To: <52AE17C7.7010101@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [128.8.1.142] Content-Type: multipart/alternative; boundary="_000_0981BC370720B14DAB7D7B0CC70B16CF68EC83F8OITMX1001ADUMDE_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_0981BC370720B14DAB7D7B0CC70B16CF68EC83F8OITMX1001ADUMDE_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks for the response. I have the preferIPv4Stack option in hadoop-env.sh= ; however; this was not preventing the mapreduce container from enumerating= the IPv6 address of the interface. Jeff From: Chris Mawata [mailto:chris.mawata@gmail.com] Sent: Sunday, December 15, 2013 3:58 PM To: user@hadoop.apache.org Subject: Re: Site-specific dfs.client.local.interfaces setting not respecte= d for Yarn MR container You might have better luck with an alternative approach to avoid having IPV= 6 which is to add to your hadoop-env.sh HADOOP_OPTS=3D"$HADOOP_OPTS -Djava.net.preferIPv4Stack=3Dtrue Chris On 12/14/2013 11:38 PM, Jeff Stuckman wrote: Hello, I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming j= obs with Hadoop 2.2.0. I am having problems with running tasks on a NM whic= h is on a different host than the RM, and I believe that this is happening = because the NM host's dfs.client.local.interfaces property is not having an= y effect. I have two hosts set up as follows: Host A (1.2.3.4): NameNode DataNode ResourceManager Job History Server Host B (5.6.7.8): DataNode NodeManager On each host, hdfs-site.xml was edited to change dfs.client.local.interface= s from an interface name ("eth0") to the IPv4 address representing that hos= t's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client = from randomly binding to the IPv6 side of the interface (it randomly swaps = between the IP4 and IP6 addresses, due to the random bind IP selection in t= he DFS client) which was causing other problems. However, I am observing that the Yarn container on the NM appears to inheri= t the property from the copy of hdfs-site.xml on the RM, rather than readin= g it from the local configuration file. In other words, setting the dfs.cli= ent.local.interfaces property in Host A's configuration file causes the Yar= n containers on Host B to use same value of the property. This causes the m= ap task to fail, as the container cannot establish a TCP connection to the = HDFS. However, on Host B, other commands that access the HDFS (such as "had= oop fs") do work, as they respect the local value of the property. To illustrate with an example, I start a streaming job from the command lin= e on Host A: hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -= input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/had= oop/toRecords.pl -reducer /bin/cat The NodeManager on Host B notes that there was an error starting the contai= ner: 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from= container-launch with container ID: container_1387067177654_0002_01_000001= and exit code: 1 org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.= java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecut= or.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launc= her.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launc= her.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source= ) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Sourc= e) at java.lang.Thread.run(Unknown Source) On Host B, I open userlogs/application_1387067177654_0002/container_1387067= 177654_0002_01_000001/syslog and find the following messages (note the DEBU= G-level messages which I manually enabled for the DFS client): 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Usin= g local interfaces [1.2.3.4] with addresses [/1.2.3.4:0] 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newI= nfo =3D LocatedBlocks{ fileLength=3D537 underConstruction=3Dfalse blocks=3D[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317= _1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; locs=3D[5.6.7.8:5= 0010, 1.2.3.4:50010]}] lastLocatedBlock=3DLocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1= 073742317_1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; locs=3D[= 5.6.7.8:50010, 1.2.3.4:50010]} isLastBlockComplete=3Dtrue} 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Conn= ecting to datanode 5.6.7.8:50010 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Usin= g local interface /1.2.3.4:0 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Faile= d to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. ja= va.net.BindException: Cannot assign requested address Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface= is actually 5.6.7.8. Note that when running other HDFS commands on Host B, Host B's setting for = dfs.client.local.interfaces is respected. On host B: hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/ 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] wi= th addresses [/5.6.7.8:0] Found 3 items drwxr-xr-x - hadoop supergroup 0 2013-12-14 00:40 hdfs://hosta/l= inesin drwxr-xr-x - hadoop supergroup 0 2013-12-14 02:01 hdfs://hosta/s= ystem drwx------ - hadoop supergroup 0 2013-12-14 10:31 hdfs://hosta/t= mp If I change dfs.client.local.interfaces on Host A to eth0 (without touching= the setting on Host B), the syslog mentioned above instead shows the follo= wing: 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Usin= g local interfaces [eth0] with addresses [/:0,/5.6.7.8:0] The job then successfully completes sometimes, but both Host A and Host B w= ill then randomly alternate between the IP4 and IP6 side of their eth0 inte= rfaces, which causes other issues. In other words, changing the dfs.client.= local.interfaces setting on Host A to a named adapter caused the Yarn conta= iner on Host B to bind to an identically named adapter. Any ideas on how I can reconfigure the cluster so every container will try = to bind to its own interface? I successfully worked around this issue by do= ing a custom build of HDFS which hardcodes my IP address in the DFSClient, = but I am looking for a better long-term solution. Thanks, Jeff --_000_0981BC370720B14DAB7D7B0CC70B16CF68EC83F8OITMX1001ADUMDE_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Thanks for the respons= e. I have the preferIPv4Stack option in hadoop-env.sh; however; this was no= t preventing the mapreduce container from enumerating the IPv6 address of t= he interface.

 

Jeff=

 

From:= Chris Mawata [mailto:chris.mawata@gmail.c= om]
Sent: Sunday, December 15, 2013 3:58 PM
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not r= espected for Yarn MR container

 

You might have better= luck with an alternative approach to avoid having IPV6 which is to add to = your hadoop-env.sh

HADOOP_OPTS=3D"$HADOOP_OPTS -Djava.net.preferIPv4Stack=3Dtrue
 
Chris
 



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,

 

I have set up a two-node Hadoop cluster on Ubuntu= 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with = running tasks on a NM which is on a different host than the RM, and I belie= ve that this is happening because the NM host's dfs.client.local.interfaces property is not having any effec= t.

 

I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server

 

Host B (5.6.7.8):

DataNode

NodeManager

 

On each host, hdfs-site.xml was edited to change = dfs.client.local.interfaces from an interface name ("eth0") to th= e IPv4 address representing that host's interface ("1.2.3.4" or &= quot;5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the I= P4 and IP6 addresses, due to the random bind IP selection in the DFS client= ) which was causing other problems.

 

However, I am observing that the Yarn container o= n the NM appears to inherit the property from the copy of hdfs-site.xml on = the RM, rather than reading it from the local configuration file. In other = words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host= B to use same value of the property. This causes the map task to fail, as = the container cannot establish a TCP connection to the HDFS. However, on Ho= st B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the loca= l value of the property.

 

To illustrate with an example, I start a streamin= g job from the command line on Host A:

 

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/ha= doop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/= linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat

 

The NodeManager on Host B notes that there was an= error starting the container:

 

13/12/14 19:38:45 WARN nodemanager.DefaultContain= erExecutor: Exception from container-launch with container ID: container_13= 87067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org= .apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org= .apache.hadoop.util.Shell.run(Shell.java:379)

        at org= .apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)=

        at org= .apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchConta= iner(DefaultContainerExecutor.java:195)

        at org= .apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerL= aunch.call(ContainerLaunch.java:283)

        at org= .apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerL= aunch.call(ContainerLaunch.java:79)

        at jav= a.util.concurrent.FutureTask.run(Unknown Source)

        at jav= a.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at jav= a.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)<= /p>

        at jav= a.lang.Thread.run(Unknown Source)

 

On Host B, I open userlogs/application_1387067177= 654_0002/container_1387067177654_0002_01_000001/syslog and find the followi= ng messages (note the DEBUG-level messages which I manually enabled for the= DFS client):

 

2013-12-14 19:38:32,439 DEBUG [main] org.apache.h= adoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2= .3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.h= adoop.hdfs.DFSClient: newInfo =3D LocatedBlocks{

  fileLength=3D537

  underConstruction=3Dfalse

  blocks=3D[LocatedBlock{BP-1911846690-1.2.3= .4-1386999495143:blk_1073742317_1493; getBlockSize()=3D537; corrupt=3Dfalse= ; offset=3D0; locs=3D[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=3DLocatedBlock{BP-1911846= 690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=3D537; corrup= t=3Dfalse; offset=3D0; locs=3D[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=3Dtrue}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.h= adoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.h= adoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.ha= doop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to = deadNodes and continue. java.net.BindException: Cannot assign requested add= ress

 

Note the failure to bind to 1.2.3.4, as the IP fo= r Node B's local interface is actually 5.6.7.8.

 

Note that when running other HDFS commands on Hos= t B, Host B's setting for dfs.client.local.interfaces is respected. On host= B:

 

hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/<= /o:p>

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using loc= al interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup &= nbsp;        0 2013-12-14 00:40 hdfs://h= osta/linesin

drwxr-xr-x   - hadoop supergroup &= nbsp;        0 2013-12-14 02:01 hdfs://h= osta/system

drwx------   - hadoop supergroup &= nbsp;        0 2013-12-14 10:31 hdfs://h= osta/tmp

 

If I change dfs.client.local.interfaces on Host A= to eth0 (without touching the setting on Host B), the syslog mentioned abo= ve instead shows the following:

 

2013-12-14 22:32:19,686 DEBUG [main] org.apache.h= adoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<so= me IP6 address>:0,/5.6.7.8:0]

 

The job then successfully completes sometimes, bu= t both Host A and Host B will then randomly alternate between the IP4 and I= P6 side of their eth0 interfaces, which causes other issues. In other words= , changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B t= o bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so= every container will try to bind to its own interface? I successfully work= ed around this issue by doing a custom build of HDFS which hardcodes my IP = address in the DFSClient, but I am looking for a better long-term solution.

 

Thanks,

Jeff

 

 

--_000_0981BC370720B14DAB7D7B0CC70B16CF68EC83F8OITMX1001ADUMDE_--