Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com
 designates 209.85.217.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BD42F346AE90F544A731516A805D1B8AD85003@SMAIL1.prd.mpac.ca>
References: <BD42F346AE90F544A731516A805D1B8AD85003@SMAIL1.prd.mpac.ca>
Date: Fri, 2 Nov 2012 13:41:19 +0530
Message-ID: 
 <CAORpBsiS=7GFh2xoD1zF05AmORoF11UOc3VNdv9enR+GO+NFxg@mail.gmail.com>
Subject: Re: cluster set-up / a few quick questions - SOLVED
From: Nitin Pawar <nitinpawar432@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=485b390f7f7639426504cd7eaef5

--485b390f7f7639426504cd7eaef5
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

you can get the script from hadoop codebase at
http://svn.apache.org/viewcvs.cgi/hadoop/common<http://svn.apache.org/viewc=
vs.cgi/hadoop/common/trunk/>


On Fri, Nov 2, 2012 at 12:41 AM, Kartashov, Andy <Andy.Kartashov@mpac.ca>wr=
ote:

> People,
>
> While I did not find start-balancer.sh script on my machine I successfull=
y
> utilized the following command:
>
> "$hadoop balancer -threshold 10" and achieved  the exact same result.
>
> One issue remains. Controlling start/stop  daemons of the slaves through
> the master. Somehow I don't have dfs-start/stop.sh nor dfs-start-all.sh
> script on my machine either.  For now, I am starting  dfs and mapreduce
> daemons on each slave manually and individually.
>
> Can someone post the content of the script star-all.sh so I could utilize
> it for my environment.
>
> Thanks,
> AK47
>
>
> -----Original Message-----
> From: Kartashov, Andy
> Sent: Friday, October 26, 2012 3:56 PM
> To: user@hadoop.apache.org
> Subject: RE: cluster set-up / a few quick questions - SOLVED
>
> Hadoopers,
>
> The problem was in EC2 security.  While I could passwordlessly ssh into
> another node and back I could not telnet to it due to EC2 firewall.  Need=
ed
> to open ports for the NN and JT.  :)
>
> Now I can see 2  DNs running "hadoop fsck "  and can also -ls into NN fro=
m
> the slave. Sweet!!!
>
> Is this possible to balance data over DNs without copying them with
>  hadoop -put command? I read about bin/start-balancer.sh somewhere but
> cannot find it on my current hadoop installation.
> Besides, is balancing data over DN going to improve perfomance of MR job?
>
> Cheers,
> Happy Hadooping.
>
> -----Original Message-----
> From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
> Sent: Friday, October 26, 2012 3:18 PM
> To: user@hadoop.apache.org
> Subject: Re: cluster set-up / a few quick questions
>
> questions
>
> 1) Have you setup password less ssh between both hosts for the user who
> owns the hadoop processes (or root)
> 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
> 3) If you started them one by one, there is no reason running a command o=
n
> one node will execute it on other.
>
>
> On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <Andy.Kartashov@mpac.ca=
>
> wrote:
> > Andy, many thanks.
> >
> > I am stuck here now so please put me in the right direction.
> >
> > I successfully ran a job on a cluster on foo1 in pseudo-distributed mod=
e
> and are now trying to try fully-dist'ed one.
> >
> > a. I created another instance foo2 on EC2. Installed hadoop on it and
> copied conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folde=
r
> on the local linux system on foo2.
> >
> > b. on foo1 I created file conf/slaves and added:
> > localhost
> > <hostname-of-foo2>
> >
> > At this point I cannot find an answer on what to do next.
> >
> > I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck
> /user/bar -files -blocks -locations", it showed # of datanode as 1.  I wa=
s
> expecting DN and TT on foo2 to be started by foo1. But it didn't happen, =
so
> I started them myself and tried the the command again. Still  one DD.
> > I realise that boo2 has no data at this point but I could not find
> bin/start-balancer.sh script to help me to balance data over to DD from
> foo1 to foo2.
> >
> > What do I do next?
> >
> > Thanks
> > AK
> >
> > -----Original Message-----
> > From: Andy Isaacson [mailto:adi@cloudera.com]
> > Sent: Friday, October 26, 2012 2:21 PM
> > To: user@hadoop.apache.org
> > Subject: Re: cluster set-up / a few quick questions
> >
> > On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <Andy.Kartashov@mpac.c=
a>
> wrote:
> >> Gents,
> >
> > We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
> >
> >> 1.
> >> - do you put Master's node <hostname> under fs.default.name in
> core-site.xml on the slave machines or slaves' hostnames?
> >
> > Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name=
is hdfs://
> foo1.domain.com.
> >
> >> - do you need to run "sudo -u hdfs hadoop namenode -format" and create
> /tmp /var folders on the HDFS of the slave machines that will be running
> only DN and TT or not? Do you still need to create hadoop/dfs/name folder
> on the slaves?
> >
> > (The following is the simple answer, for non-HA non-federated HDFS.
> > You'll want to get the simple example working before trying the
> > complicated ones.)
> >
> > No. A cluster has one namenode, running on the machine known as the
> master, and the admin must "hadoop namenode -format" on that machine only=
.
> >
> > In my example, I ran "hadoop namenode -format" on foo1.
> >
> >> 2.
> >> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specif=
y
>  /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by
> running command "mkdir -p /hadoop/dfs/data"
> >> but mapred.system.dir  property is to point to HDFS and not NFS  since
> we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
> >> If so and since it is exactly the same format  /far/boo/baz how does
> hadoop know which directory is local on NFS or HDFS?
> >
> > This is very confusing, to be sure!  There are a few places where paths
> are implicitly known to be on HDFS rather than a Linux filesystem path.
> mapred.system.dir is one of those. This does mean that given a string tha=
t
> starts with "/tmp/" you can't necessarily know whether it's a Linux path =
or
> a HDFS path without looking at the larger context.
> >
> > In the case of mapred.system.dir, the docs are the place to check;
> according to cluster_setup.html, mapred.system.dir is "Path on the HDFS
> where where the Map/Reduce framework stores system files".
> >
> > http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html
> >
> > Hope this helps,
> > -andy
> > NOTICE: This e-mail message and any attachments are confidential,
> > subject to copyright and may be privileged. Any unauthorized use,
> > copying or disclosure is prohibited. If you are not the intended
> > recipient, please delete and contact the sender immediately. Please
> > consider the environment before printing this e-mail. AVIS : le
> > pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont
> > confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre cou=
verts
> > par le secret professionnel. Toute utilisation, copie ou divulgation
> > non autoris=E9e est interdite. Si vous n'=EAtes pas le destinataire pr=
=E9vu
> > de ce courriel, supprimez-le et contactez imm=E9diatement l'exp=E9diteu=
r.
> > Veuillez penser =E0 l'environnement avant d'imprimer le pr=E9sent courr=
iel
>
>
>
> --
> Nitin Pawar
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environmen=
t
> before printing this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8c=
e
> jointe qui l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'a=
uteur
> et peuvent =EAtre couverts par le secret professionnel. Toute utilisation=
,
> copie ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas =
le
> destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diat=
ement
> l'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le p=
r=E9sent
> courriel
>


--=20
Nitin Pawar

--485b390f7f7639426504cd7eaef5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

you can get the script from hadoop codebase at=A0<a href=3D"http://svn.apac=
he.org/viewcvs.cgi/hadoop/common/trunk/" style=3D"color:rgb(101,135,255);fo=
nt-family:Verdana,Helvetica,sans-serif;font-size:13px;line-height:15px">htt=
p://svn.apache.org/viewcvs.cgi/hadoop/common</a><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Fri, Nov 2, 2012 at 12:41 AM, Kartash=
ov, Andy <span dir=3D"ltr">&lt;<a href=3D"mailto:Andy.Kartashov@mpac.ca" ta=
rget=3D"_blank">Andy.Kartashov@mpac.ca</a>&gt;</span> wrote:<br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">
People,<br>
<br>
While I did not find start-balancer.sh script on my machine I successfully =
utilized the following command:<br>
<br>
&quot;$hadoop balancer -threshold 10&quot; and achieved =A0the exact same r=
esult.<br>
<br>
One issue remains. Controlling start/stop =A0daemons of the slaves through =
the master. Somehow I don&#39;t have dfs-start/stop.sh nor dfs-start-all.sh=
 script on my machine either. =A0For now, I am starting =A0dfs and mapreduc=
e daemons on each slave manually and individually.<br>

<br>
Can someone post the content of the script star-all.sh so I could utilize i=
t for my environment.<br>
<br>
Thanks,<br>
AK47<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
-----Original Message-----<br>
From: Kartashov, Andy<br>
Sent: Friday, October 26, 2012 3:56 PM<br>
To: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.org</a><br=
>
Subject: RE: cluster set-up / a few quick questions - SOLVED<br>
<br>
Hadoopers,<br>
<br>
The problem was in EC2 security. =A0While I could passwordlessly ssh into a=
nother node and back I could not telnet to it due to EC2 firewall. =A0Neede=
d to open ports for the NN and JT. =A0:)<br>
<br>
Now I can see 2 =A0DNs running &quot;hadoop fsck &quot; =A0and can also -ls=
 into NN from the slave. Sweet!!!<br>
<br>
Is this possible to balance data over DNs without copying them with =A0hado=
op -put command? I read about bin/start-balancer.sh somewhere but cannot fi=
nd it on my current hadoop installation.<br>
Besides, is balancing data over DN going to improve perfomance of MR job?<b=
r>
<br>
Cheers,<br>
Happy Hadooping.<br>
<br>
-----Original Message-----<br>
From: Nitin Pawar [mailto:<a href=3D"mailto:nitinpawar432@gmail.com">nitinp=
awar432@gmail.com</a>]<br>
Sent: Friday, October 26, 2012 3:18 PM<br>
To: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.org</a><br=
>
Subject: Re: cluster set-up / a few quick questions<br>
<br>
questions<br>
<br>
1) Have you setup password less ssh between both hosts for the user who own=
s the hadoop processes (or root)<br>
2) If answer to questions 1 is yes, how did you start NN, JT DN and TT<br>
3) If you started them one by one, there is no reason running a command on =
one node will execute it on other.<br>
<br>
<br>
On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy &lt;<a href=3D"mailto:And=
y.Kartashov@mpac.ca">Andy.Kartashov@mpac.ca</a>&gt; wrote:<br>
&gt; Andy, many thanks.<br>
&gt;<br>
&gt; I am stuck here now so please put me in the right direction.<br>
&gt;<br>
&gt; I successfully ran a job on a cluster on foo1 in pseudo-distributed mo=
de and are now trying to try fully-dist&#39;ed one.<br>
&gt;<br>
&gt; a. I created another instance foo2 on EC2. Installed hadoop on it and =
copied conf/ =A0folder from foo1 to foo2. I created =A0/hadoop/dfs/data fol=
der on the local linux system on foo2.<br>
&gt;<br>
&gt; b. on foo1 I created file conf/slaves and added:<br>
&gt; localhost<br>
&gt; &lt;hostname-of-foo2&gt;<br>
&gt;<br>
&gt; At this point I cannot find an answer on what to do next.<br>
&gt;<br>
&gt; I started NN, DN, SNN, JT, TT on foor1. After I ran &quot;hadoop fsck =
/user/bar -files -blocks -locations&quot;, it showed # of datanode as 1. =
=A0I was expecting DN and TT on foo2 to be started by foo1. But it didn&#39=
;t happen, so I started them myself and tried the the command again. Still =
=A0one DD.<br>

&gt; I realise that boo2 has no data at this point but I could not find bin=
/start-balancer.sh script to help me to balance data over to DD from foo1 t=
o foo2.<br>
&gt;<br>
&gt; What do I do next?<br>
&gt;<br>
&gt; Thanks<br>
&gt; AK<br>
&gt;<br>
&gt; -----Original Message-----<br>
&gt; From: Andy Isaacson [mailto:<a href=3D"mailto:adi@cloudera.com">adi@cl=
oudera.com</a>]<br>
&gt; Sent: Friday, October 26, 2012 2:21 PM<br>
&gt; To: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.org</=
a><br>
&gt; Subject: Re: cluster set-up / a few quick questions<br>
&gt;<br>
&gt; On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy &lt;<a href=3D"mailto=
:Andy.Kartashov@mpac.ca">Andy.Kartashov@mpac.ca</a>&gt; wrote:<br>
&gt;&gt; Gents,<br>
&gt;<br>
&gt; We&#39;re not all male here. :) =A0I prefer &quot;Hadoopers&quot; or &=
quot;hi all,&quot;.<br>
&gt;<br>
&gt;&gt; 1.<br>
&gt;&gt; - do you put Master&#39;s node &lt;hostname&gt; under <a href=3D"h=
ttp://fs.default.name" target=3D"_blank">fs.default.name</a> in core-site.x=
ml on the slave machines or slaves&#39; hostnames?<br>
&gt;<br>
&gt; Master. =A0I have a 4-node cluster, named foo1 - foo4. My <a href=3D"h=
ttp://fs.default.name" target=3D"_blank">fs.default.name</a> is hdfs://<a h=
ref=3D"http://foo1.domain.com" target=3D"_blank">foo1.domain.com</a>.<br>
&gt;<br>
&gt;&gt; - do you need to run &quot;sudo -u hdfs hadoop namenode -format&qu=
ot; and create /tmp /var folders on the HDFS of the slave machines that wil=
l be running only DN and TT or not? Do you still need to create hadoop/dfs/=
name folder on the slaves?<br>

&gt;<br>
&gt; (The following is the simple answer, for non-HA non-federated HDFS.<br=
>
&gt; You&#39;ll want to get the simple example working before trying the<br=
>
&gt; complicated ones.)<br>
&gt;<br>
&gt; No. A cluster has one namenode, running on the machine known as the ma=
ster, and the admin must &quot;hadoop namenode -format&quot; on that machin=
e only.<br>
&gt;<br>
&gt; In my example, I ran &quot;hadoop namenode -format&quot; on foo1.<br>
&gt;<br>
&gt;&gt; 2.<br>
&gt;&gt; In hdfs-site.xml for dfs.name.dir &amp; dfs.data.dir properties =
=A0we specify =A0/hadoop/dfs/name /hadoop/dfs/data =A0being =A0local linux =
NFS directories by running command &quot;mkdir -p /hadoop/dfs/data&quot;<br=
>
&gt;&gt; but mapred.system.dir =A0property is to point to HDFS and not NFS =
=A0since we are running &quot;sudo -u hdfs hadoop fs -mkdir /tmp/mapred/sys=
tem&quot;??<br>
&gt;&gt; If so and since it is exactly the same format =A0/far/boo/baz how =
does hadoop know which directory is local on NFS or HDFS?<br>
&gt;<br>
&gt; This is very confusing, to be sure! =A0There are a few places where pa=
ths are implicitly known to be on HDFS rather than a Linux filesystem path.=
 mapred.system.dir is one of those. This does mean that given a string that=
 starts with &quot;/tmp/&quot; you can&#39;t necessarily know whether it=
9;s a Linux path or a HDFS path without looking at the larger context.<br>

&gt;<br>
&gt; In the case of mapred.system.dir, the docs are the place to check; acc=
ording to cluster_setup.html, mapred.system.dir is &quot;Path on the HDFS w=
here where the Map/Reduce framework stores system files&quot;.<br>
&gt;<br>
&gt; <a href=3D"http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html" ta=
rget=3D"_blank">http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html</a>=
<br>
&gt;<br>
&gt; Hope this helps,<br>
&gt; -andy<br>
&gt; NOTICE: This e-mail message and any attachments are confidential,<br>
&gt; subject to copyright and may be privileged. Any unauthorized use,<br>
&gt; copying or disclosure is prohibited. If you are not the intended<br>
&gt; recipient, please delete and contact the sender immediately. Please<br=
>
&gt; consider the environment before printing this e-mail. AVIS : le<br>
&gt; pr=E9sent courriel et toute pi=E8ce jointe qui l&#39;accompagne sont<b=
r>
&gt; confidentiels, prot=E9g=E9s par le droit d&#39;auteur et peuvent =EAtr=
e couverts<br>
&gt; par le secret professionnel. Toute utilisation, copie ou divulgation<b=
r>
&gt; non autoris=E9e est interdite. Si vous n&#39;=EAtes pas le destinatair=
e pr=E9vu<br>
&gt; de ce courriel, supprimez-le et contactez imm=E9diatement l&#39;exp=E9=
diteur.<br>
&gt; Veuillez penser =E0 l&#39;environnement avant d&#39;imprimer le pr=E9s=
ent courriel<br>
<br>
<br>
<br>
--<br>
Nitin Pawar<br>
NOTICE: This e-mail message and any attachments are confidential, subject t=
o copyright and may be privileged. Any unauthorized use, copying or disclos=
ure is prohibited. If you are not the intended recipient, please delete and=
 contact the sender immediately. Please consider the environment before pri=
nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui=
 l&#39;accompagne sont confidentiels, prot=E9g=E9s par le droit d&#39;auteu=
r et peuvent =EAtre couverts par le secret professionnel. Toute utilisation=
, copie ou divulgation non autoris=E9e est interdite. Si vous n&#39;=EAtes =
pas le destinataire pr=E9vu de ce courriel, supprimez-le et contactez imm=
=E9diatement l&#39;exp=E9diteur. Veuillez penser =E0 l&#39;environnement av=
ant d&#39;imprimer le pr=E9sent courriel<br>

</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Nitin Pawar<br><br>
</div>

--485b390f7f7639426504cd7eaef5--