hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Conwell <j...@iamjohn.me>
Subject What should be in the hosts file on a hadoop cluster?
Date Fri, 07 Oct 2011 21:46:47 GMT
In trouble shooting some issues on our hadoop cluster on EC2, I keep getting
pointed back to properly configuring the /etc/hosts file.  But the problem
is I've found about 5 different conflicting articles about how to config the
hosts file.  So I'm hoping to get a definitive answer to how the hosts file
should be configured for a hadoop cluster on EC2.

The first conflicting piece of info is if should be in the hosts
file and if so how its configured.  Some people say comment it out, some
people say it has to be there, some people say it has to be there, but
put localhost.localdomain on the line.

So the four possibilities I've seen are:
#  localhost  localhost  localhost localhost.localdomain  localhost.localdomain localhost

The next thing are the dns names of the machine(s) in the hadoop cluster.
 It seems like everyone is constantly saying always use the dns name, not
the ip address when configuring hadoop.  Though some people say to use the
public dns and others say use the private dns.  Either one gets resolved to
the private ip address, but does it really matter which is used?

Next, do you put the dns in the host file?  I've seen recommendations that
say you put the (public/private) dns of the local machine in the hosts file.
 I've also seen recommendations that say put all dns names for all machines
in your hadoop cluster.

So it seems like there is a big pile of confusion on the iterweb.  Could
someone set me straight as to what my hosts file on an EC2 deployed hadoop
cluster should contain?

John C

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message