hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yunhong Gu1 <y...@bert.cs.uic.edu>
Subject Hadoop, hostname, DNS, and configuration file
Date Fri, 25 Jan 2008 22:31:09 GMT


I think I found a bug but I am not 100% sure. It seems to me that in some 
part of the code, probably in Job/TaskTracker, Hadoop always try to 
resolve an IP address using the host name. If this is the case, then it is 
a bug, because IP address are already known at this stage (masters know 
the IP address of all slaves, while any slaves knows the master address in 
the configuration file).

In fact, I have multiple IP addresses on my servers, and no DNS is set 
because it is not necessary for a rack of machines used for internal 
computing purpose.

Even though I have set explicitly the IP address that each master/slave 
node should bind to, at some stage JT/TT seems still try to resolve an IP 
address using the host name. This is a possible cause of Hadoop-1374, 
which I have suffered from for the last two weeks.

I have this idea because after we disabled all network interfaces except 
for the one we use for Hadoop, and started a DNS to resolve all host 
names, the problem (Hadoop-1374) disappeared.

Several suggestions:
1. do not use hostname to resolve IP.
2. there are too many places in the configuration file to set IP 
addresses, but I am afraid they are not actually used by Hadoop at all. 
Only one IP address setting should be enough for each node.
3. use IP addresses instead of host names in logs and reports. At least 
add IP addresses after the host names. In general, the error report 
related to network problems is not accurate.

I am not very familar with Hadoop code yet. Please correct me if I am 
wrong.

Thanks
Yunhong

Mime
View raw message