hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: contrib EC2 with hadoop 0.17
Date Sun, 08 Jun 2008 00:25:40 GMT
The new scripts do not use the start/stop-all.sh scripts, and thus do  
not maintain the slaves file. This is so cluster startup is much  
faster and a bit more reliable (keys do not need to be pushed to the  
slaves). Also we can grow the cluster lazily just by starting slave  
nodes. That is, they are mostly optimized for booting a large cluster  
fast, doing work, then shutting down (allowing for huge short lived  
clusters, vs a smaller/cheaper long lived one).

But it probably would be wise to provide scripts to build/refresh the  
slaves file, and push keys to slaves, so the cluster can be  
traditionally maintained, instead of just re-instantiated with new  
parameters etc.

I wonder if these scripts would make sense in general, instead of  
being ec2 specific?


On Jun 7, 2008, at 11:31 AM, Chris Anderson wrote:

> First of all, thanks to whoever maintains the hadoop-ec2 scripts.
> They've saved us untold time and frustration getting started with a
> small testing cluster (5 instances).
> A question: when we log into the newly created cluster, and run jobs
> from the example jar (pi, etc) everything works great. We expect our
> custom jobs will run just as smoothly.
> However, when we restart the namenodes and tasktrackers by running
> bin/stop-all.sh on the master, it tries to stop only activity on
> localhost. Running start-all.sh then boots up a localhost-only cluster
> (on which jobs run just fine).
> The only way we've been able to recover from this situation is to use
> bin/terminate-hadoop-cluster and bin/destroy-hadoop-cluster and then
> start again from scratch with a new cluster.
> There must be a simple way to restart the namenodes and jobtrackers
> across all machines from the master. Also, I think understanding the
> answer to this question might put a lot more into perspective for me,
> so I can go on to do more advanced things on my own.
> Thanks for any assistance / insight!
> Chris
> output from stop-all.sh
> ==
> stopping jobtracker
> localhost: Warning: Permanently added 'localhost' (RSA) to the list of
> known hosts.
> localhost: no tasktracker to stop
> stopping namenode
> localhost: no datanode to stop
> localhost: no secondarynamenode to stop
> conf files in /usr/local/hadoop-0.17.0
> ==
> # cat conf/slaves
> localhost
> # cat conf/masters
> localhost
> -- 
> Chris Anderson
> http://jchris.mfdz.com

Chris K Wensel

View raw message