hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartashov, Andy" <Andy.Kartas...@mpac.ca>
Subject RE: cluster set-up / a few quick questions - SOLVED
Date Thu, 01 Nov 2012 19:11:27 GMT
People,

While I did not find start-balancer.sh script on my machine I successfully utilized the following
command:

"$hadoop balancer -threshold 10" and achieved  the exact same result.

One issue remains. Controlling start/stop  daemons of the slaves through the master. Somehow
I don't have dfs-start/stop.sh nor dfs-start-all.sh script on my machine either.  For now,
I am starting  dfs and mapreduce daemons on each slave manually and individually.

Can someone post the content of the script star-all.sh so I could utilize it for my environment.

Thanks,
AK47


-----Original Message-----
From: Kartashov, Andy
Sent: Friday, October 26, 2012 3:56 PM
To: user@hadoop.apache.org
Subject: RE: cluster set-up / a few quick questions - SOLVED

Hadoopers,

The problem was in EC2 security.  While I could passwordlessly ssh into another node and back
I could not telnet to it due to EC2 firewall.  Needed to open ports for the NN and JT.  :)

Now I can see 2  DNs running "hadoop fsck "  and can also -ls into NN from the slave. Sweet!!!

Is this possible to balance data over DNs without copying them with  hadoop -put command?
I read about bin/start-balancer.sh somewhere but cannot find it on my current hadoop installation.
Besides, is balancing data over DN going to improve perfomance of MR job?

Cheers,
Happy Hadooping.

-----Original Message-----
From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Friday, October 26, 2012 3:18 PM
To: user@hadoop.apache.org
Subject: Re: cluster set-up / a few quick questions

questions

1) Have you setup password less ssh between both hosts for the user who owns the hadoop processes
(or root)
2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
3) If you started them one by one, there is no reason running a command on one node will execute
it on other.


On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <Andy.Kartashov@mpac.ca> wrote:
> Andy, many thanks.
>
> I am stuck here now so please put me in the right direction.
>
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and are now
trying to try fully-dist'ed one.
>
> a. I created another instance foo2 on EC2. Installed hadoop on it and copied conf/  folder
from foo1 to foo2. I created  /hadoop/dfs/data folder on the local linux system on foo2.
>
> b. on foo1 I created file conf/slaves and added:
> localhost
> <hostname-of-foo2>
>
> At this point I cannot find an answer on what to do next.
>
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar -files -blocks
-locations", it showed # of datanode as 1.  I was expecting DN and TT on foo2 to be started
by foo1. But it didn't happen, so I started them myself and tried the the command again. Still
 one DD.
> I realise that boo2 has no data at this point but I could not find bin/start-balancer.sh
script to help me to balance data over to DD from foo1 to foo2.
>
> What do I do next?
>
> Thanks
> AK
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cloudera.com]
> Sent: Friday, October 26, 2012 2:21 PM
> To: user@hadoop.apache.org
> Subject: Re: cluster set-up / a few quick questions
>
> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <Andy.Kartashov@mpac.ca> wrote:
>> Gents,
>
> We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
>
>> 1.
>> - do you put Master's node <hostname> under fs.default.name in core-site.xml
on the slave machines or slaves' hostnames?
>
> Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name is hdfs://foo1.domain.com.
>
>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp /var
folders on the HDFS of the slave machines that will be running only DN and TT or not? Do you
still need to create hadoop/dfs/name folder on the slaves?
>
> (The following is the simple answer, for non-HA non-federated HDFS.
> You'll want to get the simple example working before trying the
> complicated ones.)
>
> No. A cluster has one namenode, running on the machine known as the master, and the admin
must "hadoop namenode -format" on that machine only.
>
> In my example, I ran "hadoop namenode -format" on foo1.
>
>> 2.
>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  /hadoop/dfs/name
/hadoop/dfs/data  being  local linux NFS directories by running command "mkdir -p /hadoop/dfs/data"
>> but mapred.system.dir  property is to point to HDFS and not NFS  since we are running
"sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
>> If so and since it is exactly the same format  /far/boo/baz how does hadoop know
which directory is local on NFS or HDFS?
>
> This is very confusing, to be sure!  There are a few places where paths are implicitly
known to be on HDFS rather than a Linux filesystem path. mapred.system.dir is one of those.
This does mean that given a string that starts with "/tmp/" you can't necessarily know whether
it's a Linux path or a HDFS path without looking at the larger context.
>
> In the case of mapred.system.dir, the docs are the place to check; according to cluster_setup.html,
mapred.system.dir is "Path on the HDFS where where the Map/Reduce framework stores system
files".
>
> http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html
>
> Hope this helps,
> -andy
> NOTICE: This e-mail message and any attachments are confidential,
> subject to copyright and may be privileged. Any unauthorized use,
> copying or disclosure is prohibited. If you are not the intended
> recipient, please delete and contact the sender immediately. Please
> consider the environment before printing this e-mail. AVIS : le
> présent courriel et toute pièce jointe qui l'accompagne sont
> confidentiels, protégés par le droit d'auteur et peuvent être couverts
> par le secret professionnel. Toute utilisation, copie ou divulgation
> non autorisée est interdite. Si vous n'êtes pas le destinataire prévu
> de ce courriel, supprimez-le et contactez immédiatement l'expéditeur.
> Veuillez penser à l'environnement avant d'imprimer le présent courriel



--
Nitin Pawar
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe
qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite.
Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Mime
View raw message