hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paco NATHAN" <cet...@gmail.com>
Subject Re: Auto-shutdown for EC2 clusters
Date Fri, 24 Oct 2008 20:58:03 GMT
Hi Karl,

Rather than using separate key pairs, you can use EC2 security groups
to keep track of different clusters.

Effectively, that requires a new security group for every cluster --
so just allocate a bunch of different ones in a config file, then have
the launch scripts draw from those. We also use EC2 static IP
addresses and then have a DNS entry named similarly to each security
group, associated with a static IP once that cluster is launched.
It's relatively simple to query the running instances and collect them
according to security groups.

One way to handle detecting failures is just to attempt SSH in a loop.
Our rough estimate is that approximately 2% of the attempted EC2 nodes
fail at launch. So we allocate more than enough, given that rate.

In a nutshell, that's one approach for managing a Hadoop cluster
remotely on EC2.

Best,
Paco


On Fri, Oct 24, 2008 at 2:07 PM, Karl Anderson <kra@monkey.org> wrote:
>
> On 23-Oct-08, at 10:01 AM, Paco NATHAN wrote:
>>
>> This workflow could be initiated from a crontab -- totally automated.
>> However, we still see occasional failures of the cluster, and must
>> restart manually, but not often.  Stability for that has improved much
>> since the 0.18 release.  For us, it's getting closer to total
>> automation.
>>
>> FWIW, that's running on EC2 m1.xl instances.
>
> Same here.  I've always had the namenode and web interface be accessible,
> but sometimes I don't get the slave nodes - usually zero slaves when this
> happens, sometimes I only miss one or two.  My rough estimate is that this
> happens 1% of the time.
>
> I currently have to notice this and restart manually.  Do you have a good
> way to detect it?  I have several Hadoop clusters running at once with the
> same AWS image and SSH keypair, so I can't count running instances.  I could
> have a separate keypair per cluster and count instances with that keypair,
> but I'd like to be able to start clusters opportunistically, with more than
> one cluster doing the same kind of job on different data.
>
>
> Karl Anderson
> kra@monkey.org
> http://monkey.org/~kra
>
>
>
>

Mime
View raw message