hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Hadoop, hostname, DNS, and configuration file
Date Sat, 26 Jan 2008 00:59:45 GMT

Your guess might be correct. unconventional network setups could cause 
problems. DFS (Namenode/Datanodes) and parts of Map/Reduce already deal 
only in ipaddresses.

To solve these issues, please reproduce the problem on as latest code 
base as possible and describe the network set up and how to reproduce 
clearly in a Jira. We certainly want Hadoop to work in not-so 
conventional configurations as well.

Thanks,
Raghu.

Yunhong Gu1 wrote:
> 
> 
> I think I found a bug but I am not 100% sure. It seems to me that in 
> some part of the code, probably in Job/TaskTracker, Hadoop always try to 
> resolve an IP address using the host name. If this is the case, then it 
> is a bug, because IP address are already known at this stage (masters 
> know the IP address of all slaves, while any slaves knows the master 
> address in the configuration file).
> 
> In fact, I have multiple IP addresses on my servers, and no DNS is set 
> because it is not necessary for a rack of machines used for internal 
> computing purpose.
> 
> Even though I have set explicitly the IP address that each master/slave 
> node should bind to, at some stage JT/TT seems still try to resolve an 
> IP address using the host name. This is a possible cause of Hadoop-1374, 
> which I have suffered from for the last two weeks.
> 
> I have this idea because after we disabled all network interfaces except 
> for the one we use for Hadoop, and started a DNS to resolve all host 
> names, the problem (Hadoop-1374) disappeared.
> 
> Several suggestions:
> 1. do not use hostname to resolve IP.
> 2. there are too many places in the configuration file to set IP 
> addresses, but I am afraid they are not actually used by Hadoop at all. 
> Only one IP address setting should be enough for each node.
> 3. use IP addresses instead of host names in logs and reports. At least 
> add IP addresses after the host names. In general, the error report 
> related to network problems is not accurate.
> 
> I am not very familar with Hadoop code yet. Please correct me if I am 
> wrong.
> 
> Thanks
> Yunhong


Mime
View raw message