hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudhir Vallamkondu <Sudhir.Vallamko...@icrossing.com>
Subject Re: Tasktracker appearing from "nowhere"
Date Tue, 01 Jun 2010 17:09:57 GMT
This is exactly why one would need to maintain a list of authorized nodes.
Here¹s the excerpt from O¹Reily ³Hadoop Definitive Guide² book. The below
cites Datanodes but it applies to TaskTrackers as well.

³It is a potential security risk to allow any machine to connect to the
namenode and act as a datanode, since the machine may gain access to data
that it is not authorized to see. Furthermore, since such a machine is not a
real datanode, it is not under your control, and may stop at any time,
causing potential data loss. This scenario is a risk even inside a firewall,
through misconfiguration, so datanodes (and tasktrackers) should be
explicitly managed on all production clusters. Datanodes that are permitted
to connect to the namenode are specified in a file whose name is specified
by the dfs.hosts property. The file resides on the namenode¹s local
filesystem, and it contains a line for each datanode, specified by network
address (as reported by the datanode---you can see what this is by looking
at the namenode¹s web UI). If you need to specify multiple network addresses
for a datanode, put them on one line, separated by whitespace. Similarly,
tasktrackers that may connect to the jobtracker are specified in a file
whose name is specified by the mapred.hosts property. In most cases, there
is one shared file, referred to as the include file, that both dfs.hosts and
mapred.hosts refer to, since nodes in the cluster run both datanode and
tasktracker daemons. The file (or files) specified by the dfs.hosts and
mapred.hosts properties is different from the slaves file. The former is
used by the namenode and jobtracker to determine which worker nodes may
connect. The slaves file is used by the Hadoop control scripts to perform
cluster-wide operations, such as cluster restarts. It is never used by the
Hadoop daemons.²

iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential
and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by reply email
and destroy all copies of the original message.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message