hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Newbie questions on Hadoop topology
Date Sun, 05 Apr 2009 05:36:31 GMT
On Sat, Apr 4, 2009 at 10:25 PM, Foss User <fossist@gmail.com> wrote:

> On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon <todd@cloudera.com> wrote:
> > On Sat, Apr 4, 2009 at 3:47 AM, Foss User <fossist@gmail.com> wrote:
> >>
> >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I
> >> have to edit this in job tracker too?
> >>
> >
> > The conf/slaves file is only used by the start/stop scripts (e.g.
> > start-all.sh). This script is just a handy wrapper that sshs to all of
> the
> > slaves to start the datanode/tasktrackers on those machines. So, you
> should
> > edit conf/slaves on whatever machine you tend to run those administrative
> > scripts from, but those are for convenience only and not necessary. You
> can
> > start the datanode/tasktracker services on the slave nodes manually and
> it
> > will work just the same.
> What are the commands to start data node and task tracker on a slave
> machine?

With the vanilla hadoop distribution, $HADOOP_HOME/bin/hadoop-daemon.sh
start datanode  (or tasktracker)

Or, if you're using the Cloudera Distribution for Hadoop, you should start
it using standard linux services (/etc/init.d/hadoop-datanode start).

> >> 5. When I add a new slave to the cluster later, do I need to run the
> >> namenode -format command again? If I have to, how do I ensure that
> >> existing data is not lost. If I don't have to, how will the folders
> >> necessary for HDFS be created in the new slave machine?
> >>
> >
> >
> > No - after starting the slave, the NN and JT will start assigning
> > blocks/jobs to the new slave immediately. The HDFS directories will be
> > created when you start up the datanode - you just need to ensure that the
> > directory configured in dfs.data.dir exists and is writable by the hadoop
> > user.
> All these days when I was working, dfs.data.dir was something like
> /tmp/hadoop-hadoop/dfs/data. But this directory never existed. Only
> /tmp existed and it was writable by Hadoop. On starting the namenode,
> on the master, this directory was created automatically on the masters
> as well as all slaves.

Starting just the namenode won't create the datadirs on the slaves. If you
used the start-dfs.sh script, that sshed into the slaves and started the
datanode on each of them, which did create the data dirs.

> So, are you correct in saying that directory configured in
> dfs.data.dir should already exist. Isn't it more like directory
> configured in dfs.data.dir would be automatically created if it
> doesn't exist? Only thing is that the hadoop user should have the
> permission to create it. Am I right?

Correct - sorry if I wasn't clear on that. The hadoop user needs to be able
to perform the equivalent of "mkdir -p" on the dfs.data.dir path.

Having the dfs.data.dir in /tmp is a default setting that you should
definitely change, though. /tmp is cleared by a cron job on most systems as
well as at boot.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message