hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mawata <chris.maw...@gmail.com>
Subject Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container
Date Sun, 15 Dec 2013 20:57:43 GMT
You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog

> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;

> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>


Mime
View raw message