Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 97890 invoked from network); 5 Apr 2009 05:26:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Apr 2009 05:26:22 -0000 Received: (qmail 85918 invoked by uid 500); 5 Apr 2009 05:26:20 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 85815 invoked by uid 500); 5 Apr 2009 05:26:19 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 85805 invoked by uid 99); 5 Apr 2009 05:26:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Apr 2009 05:26:19 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fossist@gmail.com designates 209.85.200.173 as permitted sender) Received: from [209.85.200.173] (HELO wf-out-1314.google.com) (209.85.200.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Apr 2009 05:26:12 +0000 Received: by wf-out-1314.google.com with SMTP id 23so1663722wfg.2 for ; Sat, 04 Apr 2009 22:25:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Qc5iY6lY849wU6r8mgY3L3zk7slz+jkYSgJIpJte5xk=; b=pmblBrWdKip9/GwkQtuRsv5s5V9mR3BK+8E9yxCbo0jwrPI7PjsADihwZ8sORIrYTR 2VSOchKqNCAbpV9MOqxhlthK+dMm7tgbccs/+rb+7espyO6dgYQWqz2qhWtCI55lCj21 3pTRFhd87fY2c3Ni7cqUyBmYCHdoLo7nFqJDY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=KwgqrXN/NEkEC5xoOzfhlJ9IuLe+24lJOymc16WNF5sU5bm4BtfaeYP/DEKTKo1yS7 nHK/abaI2/HJ6WSDJ25yYaHuks/7MLcj8ThTGJJICrUpM0FtG7oGnzMS7vh0Bo+O1ric 5OfOtYXt2f60tmxKQ7+5dUiT3deDGZpiwx5yg= MIME-Version: 1.0 Received: by 10.142.70.16 with SMTP id s16mr855330wfa.151.1238909151756; Sat, 04 Apr 2009 22:25:51 -0700 (PDT) In-Reply-To: References: <3f8297b20904040347y6629c91fge0040e57bb668d5b@mail.gmail.com> Date: Sun, 5 Apr 2009 10:55:51 +0530 Message-ID: <3f8297b20904042225u50e8162ch9fc8b28985ee5679@mail.gmail.com> Subject: Re: Newbie questions on Hadoop topology From: Foss User To: core-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I have a few more questions on your answers. Please see them inline. On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote: > On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote: >> >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I >> have to edit this in job tracker too? >> > > The conf/slaves file is only used by the start/stop scripts (e.g. > start-all.sh). This script is just a handy wrapper that sshs to all of the > slaves to start the datanode/tasktrackers on those machines. So, you should > edit conf/slaves on whatever machine you tend to run those administrative > scripts from, but those are for convenience only and not necessary. You can > start the datanode/tasktracker services on the slave nodes manually and it > will work just the same. What are the commands to start data node and task tracker on a slave machine? >> 5. When I add a new slave to the cluster later, do I need to run the >> namenode -format command again? If I have to, how do I ensure that >> existing data is not lost. If I don't have to, how will the folders >> necessary for HDFS be created in the new slave machine? >> > > > No - after starting the slave, the NN and JT will start assigning > blocks/jobs to the new slave immediately. The HDFS directories will be > created when you start up the datanode - you just need to ensure that the > directory configured in dfs.data.dir exists and is writable by the hadoop > user. All these days when I was working, dfs.data.dir was something like /tmp/hadoop-hadoop/dfs/data. But this directory never existed. Only /tmp existed and it was writable by Hadoop. On starting the namenode, on the master, this directory was created automatically on the masters as well as all slaves. So, are you correct in saying that directory configured in dfs.data.dir should already exist. Isn't it more like directory configured in dfs.data.dir would be automatically created if it doesn't exist? Only thing is that the hadoop user should have the permission to create it. Am I right?