Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of chris.mawata@gmail.com
 designates 209.85.213.48 as permitted sender)
Message-ID: <52AE17C7.7010101@gmail.com>
Date: Sun, 15 Dec 2013 15:57:43 -0500
From: Chris Mawata <chris.mawata@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.0;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected
 for Yarn MR container
References: <0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU>
In-Reply-To: <0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU>
Content-Type: multipart/alternative;
 boundary="------------070603030107020905040600"

This is a multi-part message in MIME format.
--------------070603030107020905040600
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris


On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog 
> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>


--------------070603030107020905040600
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">You might have better luck with an
      alternative approach to avoid having IPV6 which is to add to your
      hadoop-env.sh<br>
      <br>
      <pre>HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris

</pre>
      <br>
      <br>
      On 12/14/2013 11:38 PM, Jeff Stuckman wrote:<br>
    </div>
    <blockquote
      cite="mid:0981BC370720B14DAB7D7B0CC70B16CF68EC73EF@OITMX1001.AD.UMD.EDU"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:"Malgun Gothic";
	panose-1:2 11 5 3 2 0 0 2 0 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:"\@Malgun Gothic";
	panose-1:2 11 5 3 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:#954F72;
	text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
	{mso-style-priority:99;
	mso-style-link:"Plain Text Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
span.PlainTextChar
	{mso-style-name:"Plain Text Char";
	mso-style-priority:99;
	mso-style-link:"Plain Text";
	font-family:"Calibri","sans-serif";}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoPlainText">Hello,<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">I have set up a two-node Hadoop cluster
          on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am
          having problems with running tasks on a NM which is on a
          different host than the RM, and I believe that this is
          happening because the NM host's dfs.client.local.interfaces
          property is not having any effect.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">I have two hosts set up as follows:<o:p></o:p></p>
        <p class="MsoPlainText">Host A (1.2.3.4):<o:p></o:p></p>
        <p class="MsoPlainText">NameNode<o:p></o:p></p>
        <p class="MsoPlainText">DataNode<o:p></o:p></p>
        <p class="MsoPlainText">ResourceManager<o:p></o:p></p>
        <p class="MsoPlainText">Job History Server<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">Host B (5.6.7.8):<o:p></o:p></p>
        <p class="MsoPlainText">DataNode<o:p></o:p></p>
        <p class="MsoPlainText">NodeManager<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">On each host, hdfs-site.xml was edited
          to change dfs.client.local.interfaces from an interface name
          ("eth0") to the IPv4 address representing that host's
          interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the
          HDFS client from randomly binding to the IPv6 side of the
          interface (it randomly swaps between the IP4 and IP6
          addresses, due to the random bind IP selection in the DFS
          client) which was causing other problems.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">However, I am observing that the Yarn
          container on the NM appears to inherit the property from the
          copy of hdfs-site.xml on the RM, rather than reading it from
          the local configuration file. In other words, setting the
          dfs.client.local.interfaces property in Host A's configuration
          file causes the Yarn containers on Host B to use same value of
          the property. This causes the map task to fail, as the
          container cannot establish a TCP connection to the HDFS.
          However, on Host B, other commands that access the HDFS (such
          as "hadoop fs") do work, as they respect the local value of
          the property.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">To illustrate with an example, I start a
          streaming job from the command line on Host A:<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">hadoop jar
          $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
          -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout
          -mapper /home/hadoop/toRecords.pl -reducer /bin/cat<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">The NodeManager on Host B notes that
          there was an error starting the container:<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">13/12/14 19:38:45 WARN
          nodemanager.DefaultContainerExecutor: Exception from
          container-launch with container ID:
          container_1387067177654_0002_01_000001 and exit code: 1<o:p></o:p></p>
        <p class="MsoPlainText">org.apache.hadoop.util.Shell$ExitCodeException:<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
          org.apache.hadoop.util.Shell.runCommand(Shell.java:464)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
          org.apache.hadoop.util.Shell.run(Shell.java:379)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
          java.util.concurrent.FutureTask.run(Unknown Source)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
          java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
          Source)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
          java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
          Source)<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Unknown
          Source)<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">On Host B, I open
          userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
          and find the following messages (note the DEBUG-level messages
          which I manually enabled for the DFS client):<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">2013-12-14 19:38:32,439 DEBUG [main]
          org.apache.hadoop.hdfs.DFSClient: Using local interfaces
          [1.2.3.4] with addresses [/1.2.3.4:0]<o:p></o:p></p>
        <p class="MsoPlainText">&lt;cut&gt;<o:p></o:p></p>
        <p class="MsoPlainText">2013-12-14 19:38:33,085 DEBUG [main]
          org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp; fileLength=537<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp; underConstruction=false<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;
          blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
          getBlockSize()=537; corrupt=false; offset=0;
          locs=[5.6.7.8:50010, 1.2.3.4:50010]}]<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp;
          lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
          getBlockSize()=537; corrupt=false; offset=0;
          locs=[5.6.7.8:50010, 1.2.3.4:50010]}<o:p></o:p></p>
        <p class="MsoPlainText">&nbsp; isLastBlockComplete=true}<o:p></o:p></p>
        <p class="MsoPlainText">2013-12-14 19:38:33,088 DEBUG [main]
          org.apache.hadoop.hdfs.DFSClient: Connecting to datanode
          5.6.7.8:50010<o:p></o:p></p>
        <p class="MsoPlainText">2013-12-14 19:38:33,090 DEBUG [main]
          org.apache.hadoop.hdfs.DFSClient: Using local interface
          /1.2.3.4:0<o:p></o:p></p>
        <p class="MsoPlainText">2013-12-14 19:38:33,095 WARN [main]
          org.apache.hadoop.hdfs.DFSClient: Failed to connect to
          /5.6.7.8:50010 for block, add to deadNodes and continue.
          java.net.BindException: Cannot assign requested address<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">Note the failure to bind to 1.2.3.4, as
          the IP for Node B's local interface is actually 5.6.7.8.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">Note that when running other HDFS
          commands on Host B, Host B's setting for
          dfs.client.local.interfaces is respected. On host B:<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">hadoop@nodeb:~$ hadoop fs -ls
          hdfs://hosta/<o:p></o:p></p>
        <p class="MsoPlainText">13/12/14 19:45:10 DEBUG hdfs.DFSClient:
          Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]<o:p></o:p></p>
        <p class="MsoPlainText">Found 3 items<o:p></o:p></p>
        <p class="MsoPlainText">drwxr-xr-x&nbsp;&nbsp; - hadoop
          supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 2013-12-14 00:40 hdfs://hosta/linesin<o:p></o:p></p>
        <p class="MsoPlainText">drwxr-xr-x&nbsp;&nbsp; - hadoop
          supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 2013-12-14 02:01 hdfs://hosta/system<o:p></o:p></p>
        <p class="MsoPlainText">drwx------&nbsp;&nbsp; - hadoop
          supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 2013-12-14 10:31 hdfs://hosta/tmp<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">If I change dfs.client.local.interfaces
          on Host A to eth0 (without touching the setting on Host B),
          the syslog mentioned above instead shows the following:<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">2013-12-14 22:32:19,686 DEBUG [main]
          org.apache.hadoop.hdfs.DFSClient: Using local interfaces
          [eth0] with addresses [/&lt;some IP6 address&gt;:0,/5.6.7.8:0]<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">The job then successfully completes
          sometimes, but both Host A and Host B will then randomly
          alternate between the IP4 and IP6 side of their eth0
          interfaces, which causes other issues. In other words,
          changing the dfs.client.local.interfaces setting on Host A to
          a named adapter caused the Yarn container on Host B to bind to
          an identically named adapter.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p></o:p></p>
        <p class="MsoPlainText">Any ideas on how I can reconfigure the
          cluster so every container will try to bind to its own
          interface? I successfully worked around this issue by doing a
          custom build of HDFS which hardcodes my IP address in the
          DFSClient, but I am looking for a better long-term solution.<o:p></o:p></p>
        <p class="MsoPlainText"><o:p>&nbsp;</o:p></p>
        <p class="MsoPlainText">Thanks,<o:p></o:p></p>
        <p class="MsoPlainText">Jeff<o:p></o:p></p>
        <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------070603030107020905040600--