hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Isaacson <...@cloudera.com>
Subject Re: cluster set-up / a few quick questions
Date Fri, 26 Oct 2012 21:32:36 GMT
On Fri, Oct 26, 2012 at 11:47 AM, Kartashov, Andy
<Andy.Kartashov@mpac.ca> wrote:
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and are now
trying to try fully-dist'ed one.
> a. I created another instance foo2 on EC2.

It seems like you're trying to use the start-dfs.sh style startup
scripts to manually run a cluster on EC2.  This is doable, but it's
not very easy due to the mismatch in expectations between EC2 style
deployments and start-dfs.sh.  Setting up a manually started cluster
requires a bit of up-front work, and EC2 spin-up/spin-down cycles mean
you end up redoing that work frequently.

You might consider using whirr, http://whirr.apache.org/ as a more
automated way of deploying Hadoop clusters on EC2.

Of course, setting up a manual cluster can be a really good way to
understand how all the parts work together, and doing it on EC2 should
work just fine.

> Installed hadoop on it and copied conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data
folder on the local linux system on foo2.
> b. on foo1 I created file conf/slaves and added:
> localhost
> <hostname-of-foo2>

I'd strongly recommend being consistent with the naming, don't mix
"localhost" and DNS names. EC2 has "ec2.internal" in /etc/resolv.conf
by default, so you can "ping ip-10-42-120-3" and it should work just
fine. Then make conf/master list your first host by name, and make
conf/slaves list all your hosts by name. Note that for small clusters,
running a DN and a NN on a single host is an acceptable compromise and
works OK.

% cat conf/master
% cat conf/slaves

You also should make sure that your user account can ssh to all the nodes:
% for h in $(cat conf/slaves); do ssh -oStrictHostKeyChecking=no $h
hostname; done

 - answer "yes" to any "allow untrusted certificate" messages
 - if you get "permission denied" messages you'll need to set up the
authorized_keys properly.
 - after this loop succeeds you should be able to run it again and get
a clean list of hostnames.

> At this point I cannot find an answer on what to do next.
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar -files -blocks
-locations", it showed # of datanode as 1.  I was expecting DN and TT on foo2 to be started
by foo1. But it didn’t happen, so I started them myself and tried the the command again.
Still  one DD.

You don't need to start the daemons individually, and doing so is very
difficult to get right. I virtually never do so -- I use the
start-dfs.sh script to start the daemons (NN, DN, TT, etc). The
"master" and "slaves" config files are parsed by the start-*.sh
scripts, not by the daemons themselves.  And, the daemons don't start
themselves -- for a manual cluster, the start-*.sh scripts are
responsible. (In a production deployment such as CDH, there is a
/etc/init.d script which is managed by the distro packaging to start
and manage the daemons.)


View raw message