hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject Using hadoop over machines with multiple interfaces
Date Thu, 07 Jul 2011 12:39:05 GMT

I am trying to set up a Hadoop cluster (using hadoop-0.20.2) using a bunch
of machines each of which have 2 interfaces, a control and an internal
interface. I want only the internal interface to be used for running hadoop
(all hadoop control and data traffic is to be sent only using the internal
interface). I modified the dfs.datanode.dns.interface in hdfs-site.xml and
mapred.tasktracker.dns.interface in mapred-site.xml to point to the internal
interfaces on each of the machines in my cluster. However, even after that,
the communication happens on the control interface (a tcpdump shows that the
control interface of the nodes is being used to transfer data during the
shuffle phase!).

How can I make sure that all data exchanged between the slaves in my cluster
is through the internal interface and not using the control interface? Any
help would be appreciated.


View raw message