hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Anderson <...@monkey.org>
Subject Re: Auto-shutdown for EC2 clusters
Date Fri, 24 Oct 2008 19:07:43 GMT

On 23-Oct-08, at 10:01 AM, Paco NATHAN wrote:
> This workflow could be initiated from a crontab -- totally automated.
> However, we still see occasional failures of the cluster, and must
> restart manually, but not often.  Stability for that has improved much
> since the 0.18 release.  For us, it's getting closer to total
> automation.
> FWIW, that's running on EC2 m1.xl instances.

Same here.  I've always had the namenode and web interface be  
accessible, but sometimes I don't get the slave nodes - usually zero  
slaves when this happens, sometimes I only miss one or two.  My rough  
estimate is that this happens 1% of the time.

I currently have to notice this and restart manually.  Do you have a  
good way to detect it?  I have several Hadoop clusters running at once  
with the same AWS image and SSH keypair, so I can't count running  
instances.  I could have a separate keypair per cluster and count  
instances with that keypair, but I'd like to be able to start clusters  
opportunistically, with more than one cluster doing the same kind of  
job on different data.

Karl Anderson

View raw message