hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Foss User <foss...@gmail.com>
Subject Re: Newbie questions on Hadoop topology
Date Sun, 05 Apr 2009 05:25:51 GMT
I have a few more questions on your answers. Please see them inline.

On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon <todd@cloudera.com> wrote:
> On Sat, Apr 4, 2009 at 3:47 AM, Foss User <fossist@gmail.com> wrote:
>> 1. Should I edit conf/slaves on all nodes or only on name node? Do I
>> have to edit this in job tracker too?
> The conf/slaves file is only used by the start/stop scripts (e.g.
> start-all.sh). This script is just a handy wrapper that sshs to all of the
> slaves to start the datanode/tasktrackers on those machines. So, you should
> edit conf/slaves on whatever machine you tend to run those administrative
> scripts from, but those are for convenience only and not necessary. You can
> start the datanode/tasktracker services on the slave nodes manually and it
> will work just the same.

What are the commands to start data node and task tracker on a slave machine?

>> 5. When I add a new slave to the cluster later, do I need to run the
>> namenode -format command again? If I have to, how do I ensure that
>> existing data is not lost. If I don't have to, how will the folders
>> necessary for HDFS be created in the new slave machine?
> No - after starting the slave, the NN and JT will start assigning
> blocks/jobs to the new slave immediately. The HDFS directories will be
> created when you start up the datanode - you just need to ensure that the
> directory configured in dfs.data.dir exists and is writable by the hadoop
> user.

All these days when I was working, dfs.data.dir was something like
/tmp/hadoop-hadoop/dfs/data. But this directory never existed. Only
/tmp existed and it was writable by Hadoop. On starting the namenode,
on the master, this directory was created automatically on the masters
as well as all slaves.

So, are you correct in saying that directory configured in
dfs.data.dir should already exist. Isn't it more like directory
configured in dfs.data.dir would be automatically created if it
doesn't exist? Only thing is that the hadoop user should have the
permission to create it. Am I right?

View raw message