Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates
 209.85.128.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <0981BC370720B14DAB7D7B0CC70B16CF68EC83F8@OITMX1001.AD.UMD.EDU>
References: <0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU>
	<52AE17C7.7010101@gmail.com>
	<0981BC370720B14DAB7D7B0CC70B16CF68EC83F8@OITMX1001.AD.UMD.EDU>
Date: Mon, 16 Dec 2013 12:56:39 +0800
Message-ID: 
 <CALr1C9oGB976hZCgAHXQFL2pzVJ6OzGGxNhPWwidyCQ_O2qvtQ@mail.gmail.com>
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected
 for Yarn MR container
From: Azuryy Yu <azuryyyu@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b3a81302a494b04ed9fa317

--047d7b3a81302a494b04ed9fa317
Content-Type: text/plain; charset=ISO-8859-1

Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y


On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <stuckman@umd.edu> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:chris.mawata@gmail.com]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

--047d7b3a81302a494b04ed9fa317
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Jeff,=A0<div>DFSClient don&#39;t use copied Configuration =
from RM.</div><div><br><div>did you add hostname or IP addr in the conf/sla=
ves? if hostname, Can you check /etc/hosts? does there have confilicts? and=
 y</div>
<div><br></div></div></div><div class=3D"gmail_extra"><br><br><div class=3D=
"gmail_quote">On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <span dir=3D"l=
tr">&lt;<a href=3D"mailto:stuckman@umd.edu" target=3D"_blank">stuckman@umd.=
edu</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div bgcolor=3D"white" lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72">
<div>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Thanks for the respons=
e. I have the preferIPv4Stack option in hadoop-env.sh; however; this was no=
t preventing the mapreduce container from enumerating the IPv6 address of t=
he interface.<u></u><u></u></span></p>

<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Jeff<u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><a name=3D"142f811011d579d2__MailEndCompose"><span s=
tyle=3D"color:#1f497d"><u></u>=A0<u></u></span></a></p>
<div>
<div style=3D"border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"color:windowtext">From:</span></b>=
<span style=3D"color:windowtext"> Chris Mawata [mailto:<a href=3D"mailto:ch=
ris.mawata@gmail.com" target=3D"_blank">chris.mawata@gmail.com</a>]
<br>
<b>Sent:</b> Sunday, December 15, 2013 3:58 PM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user=
@hadoop.apache.org</a><br>
<b>Subject:</b> Re: Site-specific dfs.client.local.interfaces setting not r=
espected for Yarn MR container<u></u><u></u></span></p>
</div>
</div><div><div class=3D"h5">
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">You might have better=
 luck with an alternative approach to avoid having IPV6 which is to add to =
your hadoop-env.sh<span style=3D"font-size:12.0pt"><u></u><u></u></span></p=
>

<pre>HADOOP_OPTS=3D&quot;$HADOOP_OPTS -Djava.net.preferIPv4Stack=3Dtrue<u><=
/u><u></u></pre>
<pre><u></u>=A0<u></u></pre>
<pre>Chris<u></u><u></u></pre>
<pre><u></u>=A0<u></u></pre>
<p class=3D"MsoNormal"><br>
<br>
On 12/14/2013 11:38 PM, Jeff Stuckman wrote:<u></u><u></u></p>
</div>
<blockquote style=3D"margin-top:5.0pt;margin-bottom:5.0pt">
<p>Hello,<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streamin=
g jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM w=
hich is on a different host than the RM, and I believe that this is happeni=
ng because
 the NM host&#39;s dfs.client.local.interfaces property is not having any e=
ffect.<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>I have two hosts set up as follows:<u></u><u></u></p>
<p>Host A (1.2.3.4):<u></u><u></u></p>
<p>NameNode<u></u><u></u></p>
<p>DataNode<u></u><u></u></p>
<p>ResourceManager<u></u><u></u></p>
<p>Job History Server<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>Host B (5.6.7.8):<u></u><u></u></p>
<p>DataNode<u></u><u></u></p>
<p>NodeManager<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>On each host, hdfs-site.xml was edited to change dfs.client.local.interf=
aces from an interface name (&quot;eth0&quot;) to the IPv4 address represen=
ting that host&#39;s interface (&quot;1.2.3.4&quot; or &quot;5.6.7.8&quot;)=
. This is to prevent the HDFS client from randomly
 binding to the IPv6 side of the interface (it randomly swaps between the I=
P4 and IP6 addresses, due to the random bind IP selection in the DFS client=
) which was causing other problems.<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>However, I am observing that the Yarn container on the NM appears to inh=
erit the property from the copy of hdfs-site.xml on the RM, rather than rea=
ding it from the local configuration file. In other words, setting the dfs.=
client.local.interfaces
 property in Host A&#39;s configuration file causes the Yarn containers on =
Host B to use same value of the property. This causes the map task to fail,=
 as the container cannot establish a TCP connection to the HDFS. However, o=
n Host B, other commands that access
 the HDFS (such as &quot;hadoop fs&quot;) do work, as they respect the loca=
l value of the property.<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>To illustrate with an example, I start a streaming job from the command =
line on Host A:<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.ja=
r -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/=
hadoop/toRecords.pl -reducer /bin/cat<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>The NodeManager on Host B notes that there was an error starting the con=
tainer:<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception f=
rom container-launch with container ID: container_1387067177654_0002_01_000=
001 and exit code: 1<u></u><u></u></p>
<p>org.apache.hadoop.util.Shell$ExitCodeException:<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell.runCommand(Shell.j=
ava:464)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell.run(Shell.java:379=
)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.util.Shell$ShellCommandExecut=
or.execute(Shell.java:589)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.Defau=
ltContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)<u></=
u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.conta=
inermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)<u></u><=
u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.yarn.server.nodemanager.conta=
inermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)<u></u><u=
></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.FutureTask.run(Unknown Sou=
rce)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.runWork=
er(Unknown Source)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.=
run(Unknown Source)<u></u><u></u></p>
<p>=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Unknown Source)<u></u><u><=
/u></p>
<p>=A0<u></u><u></u></p>
<p>On Host B, I open userlogs/application_1387067177654_0002/container_1387=
067177654_0002_01_000001/syslog and find the following messages (note the D=
EBUG-level messages which I manually enabled for the DFS client):<u></u><u>=
</u></p>

<p>=A0<u></u><u></u></p>
<p>2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U=
sing local interfaces [1.2.3.4] with addresses [/<a href=3D"http://1.2.3.4:=
0" target=3D"_blank">1.2.3.4:0</a>]<u></u><u></u></p>
<p>&lt;cut&gt;<u></u><u></u></p>
<p>2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: n=
ewInfo =3D LocatedBlocks{<u></u><u></u></p>
<p>=A0 fileLength=3D537<u></u><u></u></p>
<p>=A0 underConstruction=3Dfalse<u></u><u></u></p>
<p>=A0 blocks=3D[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_10737=
42317_1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; locs=3D[<a h=
ref=3D"http://5.6.7.8:50010" target=3D"_blank">5.6.7.8:50010</a>, 1.2.3.4:5=
0010]}]<u></u><u></u></p>

<p>=A0 lastLocatedBlock=3DLocatedBlock{BP-1911846690-1.2.3.4-1386999495143:=
blk_1073742317_1493; getBlockSize()=3D537; corrupt=3Dfalse; offset=3D0; loc=
s=3D[<a href=3D"http://5.6.7.8:50010" target=3D"_blank">5.6.7.8:50010</a>, =
<a href=3D"http://1.2.3.4:50010" target=3D"_blank">1.2.3.4:50010</a>]}<u></=
u><u></u></p>

<p>=A0 isLastBlockComplete=3Dtrue}<u></u><u></u></p>
<p>2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: C=
onnecting to datanode <a href=3D"http://5.6.7.8:50010" target=3D"_blank">5.=
6.7.8:50010</a><u></u><u></u></p>
<p>2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U=
sing local interface /<a href=3D"http://1.2.3.4:0" target=3D"_blank">1.2.3.=
4:0</a><u></u><u></u></p>
<p>2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Fa=
iled to connect to /<a href=3D"http://5.6.7.8:50010" target=3D"_blank">5.6.=
7.8:50010</a> for block, add to deadNodes and continue. java.net.BindExcept=
ion: Cannot assign requested address<u></u><u></u></p>

<p>=A0<u></u><u></u></p>
<p>Note the failure to bind to 1.2.3.4, as the IP for Node B&#39;s local in=
terface is actually 5.6.7.8.<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>Note that when running other HDFS commands on Host B, Host B&#39;s setti=
ng for dfs.client.local.interfaces is respected. On host B:<u></u><u></u></=
p>
<p>=A0<u></u><u></u></p>
<p>hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/<u></u><u></u></p>
<p>13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]=
 with addresses [/<a href=3D"http://5.6.7.8:0" target=3D"_blank">5.6.7.8:0<=
/a>]<u></u><u></u></p>
<p>Found 3 items<u></u><u></u></p>
<p>drwxr-xr-x=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1=
2-14 00:40 hdfs://hosta/linesin<u></u><u></u></p>
<p>drwxr-xr-x=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1=
2-14 02:01 hdfs://hosta/system<u></u><u></u></p>
<p>drwx------=A0=A0 - hadoop supergroup=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2013-1=
2-14 10:31 hdfs://hosta/tmp<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>If I change dfs.client.local.interfaces on Host A to eth0 (without touch=
ing the setting on Host B), the syslog mentioned above instead shows the fo=
llowing:<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: U=
sing local interfaces [eth0] with addresses [/&lt;some IP6 address&gt;:0,/<=
a href=3D"http://5.6.7.8:0" target=3D"_blank">5.6.7.8:0</a>]<u></u><u></u><=
/p>

<p>=A0<u></u><u></u></p>
<p>The job then successfully completes sometimes, but both Host A and Host =
B will then randomly alternate between the IP4 and IP6 side of their eth0 i=
nterfaces, which causes other issues. In other words, changing the dfs.clie=
nt.local.interfaces
 setting on Host A to a named adapter caused the Yarn container on Host B t=
o bind to an identically named adapter.<u></u><u></u></p>
<p>Any ideas on how I can reconfigure the cluster so every container will t=
ry to bind to its own interface? I successfully worked around this issue by=
 doing a custom build of HDFS which hardcodes my IP address in the DFSClien=
t, but I am
 looking for a better long-term solution.<u></u><u></u></p>
<p>=A0<u></u><u></u></p>
<p>Thanks,<u></u><u></u></p>
<p>Jeff<u></u><u></u></p>
<p class=3D"MsoNormal">=A0<u></u><u></u></p>
</blockquote>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:&quot;Ti=
mes New Roman&quot;,&quot;serif&quot;"><u></u>=A0<u></u></span></p>
</div></div></div>
</div>

</blockquote></div><br></div>

--047d7b3a81302a494b04ed9fa317--