Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EE9010ABF for ; Mon, 16 Dec 2013 04:57:20 +0000 (UTC) Received: (qmail 46568 invoked by uid 500); 16 Dec 2013 04:57:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 45933 invoked by uid 500); 16 Dec 2013 04:57:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 45913 invoked by uid 99); 16 Dec 2013 04:57:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Dec 2013 04:57:06 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates 209.85.128.174 as permitted sender) Received: from [209.85.128.174] (HELO mail-ve0-f174.google.com) (209.85.128.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Dec 2013 04:57:00 +0000 Received: by mail-ve0-f174.google.com with SMTP id pa12so2900782veb.5 for ; Sun, 15 Dec 2013 20:56:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3WV7z37sfTVT4gY4fAKhgJWmik6ZeswjyRjY6e8IK78=; b=lSXjMN4u//x5HNOH34h6K8nWxm2Tk6X0OSB32WCJE9a6SxDbpU4JSr1/tBcieC/N5d rf3Hnn9ggH12fyot8EEuvMfG7yAtHknsKEZCEBNMtbAlN/OMFUGs0z5NoW/3qs3Vr0A2 r/Xfttys+bO8N48jahdS3jU+9B2EI9REzC7uM1bQZ6ldFN1aKu5jSgxYnwuADSjtANoW JHJlY/+N4CCQ/SEhjD2XjbLf/hjHSBGFoq7c18682r7ijM4I+GINksgBPqlAfkYPtOmL nxyEF7NYbsv2WdstGadDl9GcoL1qniYtgIBQEYYTI6X9YvlbAsDho9vE9Uu6poCaFZQi 8Ftg== MIME-Version: 1.0 X-Received: by 10.58.24.162 with SMTP id v2mr1189592vef.39.1387169799881; Sun, 15 Dec 2013 20:56:39 -0800 (PST) Received: by 10.220.4.135 with HTTP; Sun, 15 Dec 2013 20:56:39 -0800 (PST) In-Reply-To: <0981BC370720B14DAB7D7B0CC70B16CF68EC83F8@OITMX1001.AD.UMD.EDU> References: <0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU> <52AE17C7.7010101@gmail.com> <0981BC370720B14DAB7D7B0CC70B16CF68EC83F8@OITMX1001.AD.UMD.EDU> Date: Mon, 16 Dec 2013 12:56:39 +0800 Message-ID: Subject: Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container From: Azuryy Yu To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b3a81302a494b04ed9fa317 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a81302a494b04ed9fa317 Content-Type: text/plain; charset=ISO-8859-1 Jeff, DFSClient don't use copied Configuration from RM. did you add hostname or IP addr in the conf/slaves? if hostname, Can you check /etc/hosts? does there have confilicts? and y On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman wrote: > Thanks for the response. I have the preferIPv4Stack option in > hadoop-env.sh; however; this was not preventing the mapreduce container > from enumerating the IPv6 address of the interface. > > > > Jeff > > > > *From:* Chris Mawata [mailto:chris.mawata@gmail.com] > *Sent:* Sunday, December 15, 2013 3:58 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Site-specific dfs.client.local.interfaces setting not > respected for Yarn MR container > > > > You might have better luck with an alternative approach to avoid having > IPV6 which is to add to your hadoop-env.sh > > HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true > > > > Chris > > > > > > On 12/14/2013 11:38 PM, Jeff Stuckman wrote: > > Hello, > > > > I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming > jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM > which is on a different host than the RM, and I believe that this is > happening because the NM host's dfs.client.local.interfaces property is not > having any effect. > > > > I have two hosts set up as follows: > > Host A (1.2.3.4): > > NameNode > > DataNode > > ResourceManager > > Job History Server > > > > Host B (5.6.7.8): > > DataNode > > NodeManager > > > > On each host, hdfs-site.xml was edited to change > dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 > address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This > is to prevent the HDFS client from randomly binding to the IPv6 side of the > interface (it randomly swaps between the IP4 and IP6 addresses, due to the > random bind IP selection in the DFS client) which was causing other > problems. > > > > However, I am observing that the Yarn container on the NM appears to > inherit the property from the copy of hdfs-site.xml on the RM, rather than > reading it from the local configuration file. In other words, setting the > dfs.client.local.interfaces property in Host A's configuration file causes > the Yarn containers on Host B to use same value of the property. This > causes the map task to fail, as the container cannot establish a TCP > connection to the HDFS. However, on Host B, other commands that access the > HDFS (such as "hadoop fs") do work, as they respect the local value of the > property. > > > > To illustrate with an example, I start a streaming job from the command > line on Host A: > > > > hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar > -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper > /home/hadoop/toRecords.pl -reducer /bin/cat > > > > The NodeManager on Host B notes that there was an error starting the > container: > > > > 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception > from container-launch with container ID: > container_1387067177654_0002_01_000001 and exit code: 1 > > org.apache.hadoop.util.Shell$ExitCodeException: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > > at org.apache.hadoop.util.Shell.run(Shell.java:379) > > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) > > at java.util.concurrent.FutureTask.run(Unknown Source) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > > at java.lang.Thread.run(Unknown Source) > > > > On Host B, I open > userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog > and find the following messages (note the DEBUG-level messages which I > manually enabled for the DFS client): > > > > 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0] > > > > 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > newInfo = LocatedBlocks{ > > fileLength=537 > > underConstruction=false > > > blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; > getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, > 1.2.3.4:50010]}] > > > lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; > getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, > 1.2.3.4:50010]} > > isLastBlockComplete=true} > > 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Connecting to datanode 5.6.7.8:50010 > > 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interface /1.2.3.4:0 > > 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: > Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and > continue. java.net.BindException: Cannot assign requested address > > > > Note the failure to bind to 1.2.3.4, as the IP for Node B's local > interface is actually 5.6.7.8. > > > > Note that when running other HDFS commands on Host B, Host B's setting for > dfs.client.local.interfaces is respected. On host B: > > > > hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/ > > 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] > with addresses [/5.6.7.8:0] > > Found 3 items > > drwxr-xr-x - hadoop supergroup 0 2013-12-14 00:40 > hdfs://hosta/linesin > > drwxr-xr-x - hadoop supergroup 0 2013-12-14 02:01 > hdfs://hosta/system > > drwx------ - hadoop supergroup 0 2013-12-14 10:31 > hdfs://hosta/tmp > > > > If I change dfs.client.local.interfaces on Host A to eth0 (without > touching the setting on Host B), the syslog mentioned above instead shows > the following: > > > > 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interfaces [eth0] with addresses [/:0,/ > 5.6.7.8:0] > > > > The job then successfully completes sometimes, but both Host A and Host B > will then randomly alternate between the IP4 and IP6 side of their eth0 > interfaces, which causes other issues. In other words, changing the > dfs.client.local.interfaces setting on Host A to a named adapter caused the > Yarn container on Host B to bind to an identically named adapter. > > Any ideas on how I can reconfigure the cluster so every container will try > to bind to its own interface? I successfully worked around this issue by > doing a custom build of HDFS which hardcodes my IP address in the > DFSClient, but I am looking for a better long-term solution. > > > > Thanks, > > Jeff > > > > > --047d7b3a81302a494b04ed9fa317 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Jeff,=A0
DFSClient don't use copied Configuration = from RM.

did you add hostname or IP addr in the conf/sla= ves? if hostname, Can you check /etc/hosts? does there have confilicts? and= y



On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <stuckman@umd.= edu> wrote:

Thanks for the respons= e. I have the preferIPv4Stack option in hadoop-env.sh; however; this was no= t preventing the mapreduce container from enumerating the IPv6 address of t= he interface.

=A0

Jeff

=A0

From:= Chris Mawata [mailto:chris.mawata@gmail.com]
Sent: Sunday, December 15, 2013 3:58 PM
To: user= @hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not r= espected for Yarn MR container

=A0

You might have better= luck with an alternative approach to avoid having IPV6 which is to add to = your hadoop-env.sh

HADOOP_OPTS=3D"$HADOOP_OPTS -Djava.net.preferIPv4Stack=3Dtrue<=
/u>
=A0
Chris
=A0



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,

=A0

I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streamin= g jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM w= hich is on a different host than the RM, and I believe that this is happeni= ng because the NM host's dfs.client.local.interfaces property is not having any e= ffect.

=A0

I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server

=A0

Host B (5.6.7.8):

DataNode

NodeManager

=A0

On each host, hdfs-site.xml was edited to change dfs.client.local.interf= aces from an interface name ("eth0") to the IPv4 address represen= ting that host's interface ("1.2.3.4" or "5.6.7.8")= . This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the I= P4 and IP6 addresses, due to the random bind IP selection in the DFS client= ) which was causing other problems.

=A0

However, I am observing that the Yarn container on the NM appears to inh= erit the property from the copy of hdfs-site.xml on the RM, rather than rea= ding it from the local configuration file. In other words, setting the dfs.= client.local.interfaces property in Host A's configuration file causes the Yarn containers on = Host B to use same value of the property. This causes the map task to fail,= as the container cannot establish a TCP connection to the HDFS. However, o= n Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the loca= l value of the property.

=A0

To illustrate with an example, I start a streaming job from the command = line on Host A:

=A0

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.ja= r -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/= hadoop/toRecords.pl -reducer /bin/cat

=A0

The NodeManager on Host B notes that there was an error starting the con= tainer:

=A0

13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception f= rom container-launch with container ID: container_1387067177654_0002_01_000= 001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell.runCommand(Shell.j= ava:464)

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell.run(Shell.java:379= )

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell$ShellCommandExecut= or.execute(Shell.java:589)

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.Defau= ltContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.conta= inermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)<= u>

=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.conta= inermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.FutureTask.run(Unknown Sou= rce)

=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.runWork= er(Unknown Source)

=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.= run(Unknown Source)

=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Unknown Source)<= /u>

=A0

On Host B, I open userlogs/application_1387067177654_0002/container_1387= 067177654_0002_01_000001/syslog and find the following messages (note the D= EBUG-level messages which I manually enabled for the DFS client):=

=A0

2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U= sing local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: n= ewInfo =3D LocatedBlocks{

=A0 fileLength=3D537

=A0 underConstruction=3Dfalse

=A0 blocks=3D[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_10737= 42317_1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; locs=3D[5.6.7.8:50010, 1.2.3.4:5= 0010]}]

=A0 lastLocatedBlock=3DLocatedBlock{BP-1911846690-1.2.3.4-1386999495143:= blk_1073742317_1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; loc= s=3D[5.6.7.8:50010, = 1.2.3.4:50010]}

=A0 isLastBlockComplete=3Dtrue}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: C= onnecting to datanode 5.= 6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U= sing local interface /1.2.3.= 4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Fa= iled to connect to /5.6.= 7.8:50010 for block, add to deadNodes and continue. java.net.BindExcept= ion: Cannot assign requested address

=A0

Note the failure to bind to 1.2.3.4, as the IP for Node B's local in= terface is actually 5.6.7.8.

=A0

Note that when running other HDFS commands on Host B, Host B's setti= ng for dfs.client.local.interfaces is respected. On host B:

=A0

hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]= with addresses [/5.6.7.8:0<= /a>]

Found 3 items

drwxr-xr-x=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1= 2-14 00:40 hdfs://hosta/linesin

drwxr-xr-x=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1= 2-14 02:01 hdfs://hosta/system

drwx------=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1= 2-14 10:31 hdfs://hosta/tmp

=A0

If I change dfs.client.local.interfaces on Host A to eth0 (without touch= ing the setting on Host B), the syslog mentioned above instead shows the fo= llowing:

=A0

2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U= sing local interfaces [eth0] with addresses [/<some IP6 address>:0,/<= a href=3D"http://5.6.7.8:0" target=3D"_blank">5.6.7.8:0]<= /p>

=A0

The job then successfully completes sometimes, but both Host A and Host = B will then randomly alternate between the IP4 and IP6 side of their eth0 i= nterfaces, which causes other issues. In other words, changing the dfs.clie= nt.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B t= o bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will t= ry to bind to its own interface? I successfully worked around this issue by= doing a custom build of HDFS which hardcodes my IP address in the DFSClien= t, but I am looking for a better long-term solution.

=A0

Thanks,

Jeff

=A0

=A0


--047d7b3a81302a494b04ed9fa317--