Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08594D003 for ; Fri, 2 Nov 2012 08:11:52 +0000 (UTC) Received: (qmail 87351 invoked by uid 500); 2 Nov 2012 08:11:47 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 87108 invoked by uid 500); 2 Nov 2012 08:11:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 87086 invoked by uid 99); 2 Nov 2012 08:11:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2012 08:11:45 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2012 08:11:41 +0000 Received: by mail-lb0-f176.google.com with SMTP id i8so2480381lbo.35 for ; Fri, 02 Nov 2012 01:11:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Ks6TMUwadwlbkaLeaQSI/PHFNeZ6E6Q2oYZjsbOcO7o=; b=kbo7WlzG83L7z5kUs6yK7mRPGsKAJ6/ALYnq/tG7FmAtbUSwF5AdIL1IGY3pvlZx0W GJpVksrCEC/nW6CeVpLc4ItIh97U8tB2KP/gSYcoO0PyafB4BK2uAhUVrs+DVEaSj1KN fCZt5zPGXB52ExWZDVQKvPLSv22perwP0MZcN7xtCaWAL8v/R1Zmw4z3CBFn17T2QCh3 mxxdkotiT3A/cQ01q3wg/UnVJk1P9SBMJRQYOpEOpA2d9Fx/0b6jK28UH3r0bodhzJ4s e28tKD3kGkBZ+lgNDlZG9LXB7p8vjQoDnUcmN+1lR5PN4txRS49MV0koBCF5UDhUDhk6 ScSg== MIME-Version: 1.0 Received: by 10.112.37.138 with SMTP id y10mr402779lbj.121.1351843879403; Fri, 02 Nov 2012 01:11:19 -0700 (PDT) Received: by 10.112.7.10 with HTTP; Fri, 2 Nov 2012 01:11:19 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Nov 2012 13:41:19 +0530 Message-ID: Subject: Re: cluster set-up / a few quick questions - SOLVED From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=485b390f7f7639426504cd7eaef5 X-Virus-Checked: Checked by ClamAV on apache.org --485b390f7f7639426504cd7eaef5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable you can get the script from hadoop codebase at http://svn.apache.org/viewcvs.cgi/hadoop/common On Fri, Nov 2, 2012 at 12:41 AM, Kartashov, Andy wr= ote: > People, > > While I did not find start-balancer.sh script on my machine I successfull= y > utilized the following command: > > "$hadoop balancer -threshold 10" and achieved the exact same result. > > One issue remains. Controlling start/stop daemons of the slaves through > the master. Somehow I don't have dfs-start/stop.sh nor dfs-start-all.sh > script on my machine either. For now, I am starting dfs and mapreduce > daemons on each slave manually and individually. > > Can someone post the content of the script star-all.sh so I could utilize > it for my environment. > > Thanks, > AK47 > > > -----Original Message----- > From: Kartashov, Andy > Sent: Friday, October 26, 2012 3:56 PM > To: user@hadoop.apache.org > Subject: RE: cluster set-up / a few quick questions - SOLVED > > Hadoopers, > > The problem was in EC2 security. While I could passwordlessly ssh into > another node and back I could not telnet to it due to EC2 firewall. Need= ed > to open ports for the NN and JT. :) > > Now I can see 2 DNs running "hadoop fsck " and can also -ls into NN fro= m > the slave. Sweet!!! > > Is this possible to balance data over DNs without copying them with > hadoop -put command? I read about bin/start-balancer.sh somewhere but > cannot find it on my current hadoop installation. > Besides, is balancing data over DN going to improve perfomance of MR job? > > Cheers, > Happy Hadooping. > > -----Original Message----- > From: Nitin Pawar [mailto:nitinpawar432@gmail.com] > Sent: Friday, October 26, 2012 3:18 PM > To: user@hadoop.apache.org > Subject: Re: cluster set-up / a few quick questions > > questions > > 1) Have you setup password less ssh between both hosts for the user who > owns the hadoop processes (or root) > 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT > 3) If you started them one by one, there is no reason running a command o= n > one node will execute it on other. > > > On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy > wrote: > > Andy, many thanks. > > > > I am stuck here now so please put me in the right direction. > > > > I successfully ran a job on a cluster on foo1 in pseudo-distributed mod= e > and are now trying to try fully-dist'ed one. > > > > a. I created another instance foo2 on EC2. Installed hadoop on it and > copied conf/ folder from foo1 to foo2. I created /hadoop/dfs/data folde= r > on the local linux system on foo2. > > > > b. on foo1 I created file conf/slaves and added: > > localhost > > > > > > At this point I cannot find an answer on what to do next. > > > > I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck > /user/bar -files -blocks -locations", it showed # of datanode as 1. I wa= s > expecting DN and TT on foo2 to be started by foo1. But it didn't happen, = so > I started them myself and tried the the command again. Still one DD. > > I realise that boo2 has no data at this point but I could not find > bin/start-balancer.sh script to help me to balance data over to DD from > foo1 to foo2. > > > > What do I do next? > > > > Thanks > > AK > > > > -----Original Message----- > > From: Andy Isaacson [mailto:adi@cloudera.com] > > Sent: Friday, October 26, 2012 2:21 PM > > To: user@hadoop.apache.org > > Subject: Re: cluster set-up / a few quick questions > > > > On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy > wrote: > >> Gents, > > > > We're not all male here. :) I prefer "Hadoopers" or "hi all,". > > > >> 1. > >> - do you put Master's node under fs.default.name in > core-site.xml on the slave machines or slaves' hostnames? > > > > Master. I have a 4-node cluster, named foo1 - foo4. My fs.default.name= is hdfs:// > foo1.domain.com. > > > >> - do you need to run "sudo -u hdfs hadoop namenode -format" and create > /tmp /var folders on the HDFS of the slave machines that will be running > only DN and TT or not? Do you still need to create hadoop/dfs/name folder > on the slaves? > > > > (The following is the simple answer, for non-HA non-federated HDFS. > > You'll want to get the simple example working before trying the > > complicated ones.) > > > > No. A cluster has one namenode, running on the machine known as the > master, and the admin must "hadoop namenode -format" on that machine only= . > > > > In my example, I ran "hadoop namenode -format" on foo1. > > > >> 2. > >> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties we specif= y > /hadoop/dfs/name /hadoop/dfs/data being local linux NFS directories by > running command "mkdir -p /hadoop/dfs/data" > >> but mapred.system.dir property is to point to HDFS and not NFS since > we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"?? > >> If so and since it is exactly the same format /far/boo/baz how does > hadoop know which directory is local on NFS or HDFS? > > > > This is very confusing, to be sure! There are a few places where paths > are implicitly known to be on HDFS rather than a Linux filesystem path. > mapred.system.dir is one of those. This does mean that given a string tha= t > starts with "/tmp/" you can't necessarily know whether it's a Linux path = or > a HDFS path without looking at the larger context. > > > > In the case of mapred.system.dir, the docs are the place to check; > according to cluster_setup.html, mapred.system.dir is "Path on the HDFS > where where the Map/Reduce framework stores system files". > > > > http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html > > > > Hope this helps, > > -andy > > NOTICE: This e-mail message and any attachments are confidential, > > subject to copyright and may be privileged. Any unauthorized use, > > copying or disclosure is prohibited. If you are not the intended > > recipient, please delete and contact the sender immediately. Please > > consider the environment before printing this e-mail. AVIS : le > > pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont > > confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre cou= verts > > par le secret professionnel. Toute utilisation, copie ou divulgation > > non autoris=E9e est interdite. Si vous n'=EAtes pas le destinataire pr= =E9vu > > de ce courriel, supprimez-le et contactez imm=E9diatement l'exp=E9diteu= r. > > Veuillez penser =E0 l'environnement avant d'imprimer le pr=E9sent courr= iel > > > > -- > Nitin Pawar > NOTICE: This e-mail message and any attachments are confidential, subject > to copyright and may be privileged. Any unauthorized use, copying or > disclosure is prohibited. If you are not the intended recipient, please > delete and contact the sender immediately. Please consider the environmen= t > before printing this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8c= e > jointe qui l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'a= uteur > et peuvent =EAtre couverts par le secret professionnel. Toute utilisation= , > copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas = le > destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diat= ement > l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le p= r=E9sent > courriel > --=20 Nitin Pawar --485b390f7f7639426504cd7eaef5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable you can get the script from hadoop codebase at=A0htt= p://svn.apache.org/viewcvs.cgi/hadoop/common


On Fri, Nov 2, 2012 at 12:41 AM, Kartash= ov, Andy <Andy.Kartashov@mpac.ca> wrote:
People,

While I did not find start-balancer.sh script on my machine I successfully = utilized the following command:

"$hadoop balancer -threshold 10" and achieved =A0the exact same r= esult.

One issue remains. Controlling start/stop =A0daemons of the slaves through = the master. Somehow I don't have dfs-start/stop.sh nor dfs-start-all.sh= script on my machine either. =A0For now, I am starting =A0dfs and mapreduc= e daemons on each slave manually and individually.

Can someone post the content of the script star-all.sh so I could utilize i= t for my environment.

Thanks,
AK47


-----Original Message-----
From: Kartashov, Andy
Sent: Friday, October 26, 2012 3:56 PM
To: user@hadoop.apache.org Subject: RE: cluster set-up / a few quick questions - SOLVED

Hadoopers,

The problem was in EC2 security. =A0While I could passwordlessly ssh into a= nother node and back I could not telnet to it due to EC2 firewall. =A0Neede= d to open ports for the NN and JT. =A0:)

Now I can see 2 =A0DNs running "hadoop fsck " =A0and can also -ls= into NN from the slave. Sweet!!!

Is this possible to balance data over DNs without copying them with =A0hado= op -put command? I read about bin/start-balancer.sh somewhere but cannot fi= nd it on my current hadoop installation.
Besides, is balancing data over DN going to improve perfomance of MR job?
Cheers,
Happy Hadooping.

-----Original Message-----
From: Nitin Pawar [mailto:nitinp= awar432@gmail.com]
Sent: Friday, October 26, 2012 3:18 PM
To: user@hadoop.apache.org Subject: Re: cluster set-up / a few quick questions

questions

1) Have you setup password less ssh between both hosts for the user who own= s the hadoop processes (or root)
2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
3) If you started them one by one, there is no reason running a command on = one node will execute it on other.


On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <Andy.Kartashov@mpac.ca> wrote:
> Andy, many thanks.
>
> I am stuck here now so please put me in the right direction.
>
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mo= de and are now trying to try fully-dist'ed one.
>
> a. I created another instance foo2 on EC2. Installed hadoop on it and = copied conf/ =A0folder from foo1 to foo2. I created =A0/hadoop/dfs/data fol= der on the local linux system on foo2.
>
> b. on foo1 I created file conf/slaves and added:
> localhost
> <hostname-of-foo2>
>
> At this point I cannot find an answer on what to do next.
>
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck = /user/bar -files -blocks -locations", it showed # of datanode as 1. = =A0I was expecting DN and TT on foo2 to be started by foo1. But it didn'= ;t happen, so I started them myself and tried the the command again. Still = =A0one DD.
> I realise that boo2 has no data at this point but I could not find bin= /start-balancer.sh script to help me to balance data over to DD from foo1 t= o foo2.
>
> What do I do next?
>
> Thanks
> AK
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cl= oudera.com]
> Sent: Friday, October 26, 2012 2:21 PM
> To: user@hadoop.apache.org
> Subject: Re: cluster set-up / a few quick questions
>
> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <
Andy.Kartashov@mpac.ca> wrote:
>> Gents,
>
> We're not all male here. :) =A0I prefer "Hadoopers" or &= quot;hi all,".
>
>> 1.
>> - do you put Master's node <hostname> under fs.default.name in core-site.x= ml on the slave machines or slaves' hostnames?
>
> Master. =A0I have a 4-node cluster, named foo1 - foo4. My fs.default.name is hdfs://foo1.domain.com.
>
>> - do you need to run "sudo -u hdfs hadoop namenode -format&qu= ot; and create /tmp /var folders on the HDFS of the slave machines that wil= l be running only DN and TT or not? Do you still need to create hadoop/dfs/= name folder on the slaves?
>
> (The following is the simple answer, for non-HA non-federated HDFS. > You'll want to get the simple example working before trying the > complicated ones.)
>
> No. A cluster has one namenode, running on the machine known as the ma= ster, and the admin must "hadoop namenode -format" on that machin= e only.
>
> In my example, I ran "hadoop namenode -format" on foo1.
>
>> 2.
>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties = =A0we specify =A0/hadoop/dfs/name /hadoop/dfs/data =A0being =A0local linux = NFS directories by running command "mkdir -p /hadoop/dfs/data" >> but mapred.system.dir =A0property is to point to HDFS and not NFS = =A0since we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/sys= tem"??
>> If so and since it is exactly the same format =A0/far/boo/baz how = does hadoop know which directory is local on NFS or HDFS?
>
> This is very confusing, to be sure! =A0There are a few places where pa= ths are implicitly known to be on HDFS rather than a Linux filesystem path.= mapred.system.dir is one of those. This does mean that given a string that= starts with "/tmp/" you can't necessarily know whether it= 9;s a Linux path or a HDFS path without looking at the larger context.
>
> In the case of mapred.system.dir, the docs are the place to check; acc= ording to cluster_setup.html, mapred.system.dir is "Path on the HDFS w= here where the Map/Reduce framework stores system files".
>
> http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html=
>
> Hope this helps,
> -andy
> NOTICE: This e-mail message and any attachments are confidential,
> subject to copyright and may be privileged. Any unauthorized use,
> copying or disclosure is prohibited. If you are not the intended
> recipient, please delete and contact the sender immediately. Please > consider the environment before printing this e-mail. AVIS : le
> pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont > confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtr= e couverts
> par le secret professionnel. Toute utilisation, copie ou divulgation > non autoris=E9e est interdite. Si vous n'=EAtes pas le destinatair= e pr=E9vu
> de ce courriel, supprimez-le et contactez imm=E9diatement l'exp=E9= diteur.
> Veuillez penser =E0 l'environnement avant d'imprimer le pr=E9s= ent courriel



--
Nitin Pawar
NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteu= r et peuvent =EAtre couverts par le secret professionnel. Toute utilisation= , copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes = pas le destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm= =E9diatement l'exp=E9diteur. Veuillez penser =E0 l'environnement av= ant d'imprimer le pr=E9sent courriel



--
= Nitin Pawar

--485b390f7f7639426504cd7eaef5--