hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: How to manage a large cluster?
Date Tue, 16 Sep 2008 09:03:31 GMT
Paco NATHAN wrote:
> We use an EC2 image onto which we install Java, Ant, Hadoop, etc. To
> make it simple, pull those from S3 buckets. That provides a flexible
> pattern for managing the frameworks involved, more so than needing to
> re-do an EC2 image whenever you want to add a patch to Hadoop.
> 
> Given that approach, you can add your Hadoop application code
> similarly. Just upload the current stable build out of SVN, Git,
> whatever, to an S3 bucket.

Nice. Your CI tool could upload the latest release tagged as good and 
the machines could pull it down.

The goal of cluster management is to make the addition/removal of an 
extra node an O(1) problem; you edit one entry in one place to increment 
or decrement the number of machines, and that's it.

If you find you have lots of images to keep alive, then your costs go 
up. Keep the # of images you have to 1 and you will stay in control.

> 
> We use a set of Python scripts to manage a daily, (mostly) automated
> launch of 100+ EC2 nodes for a Hadoop cluster.  We also run a listener
> on a local server, so that the Hadoop job can send notification when
> it completes, and allow the local server to initiate download of
> results.  Overall, that minimizes the need for having a sysadmin
> dedicated to the Hadoop jobs -- a small dev team can handle it, while
> focusing on algorithm development and testing.

1. We have some components that use google talk to relay messages to 
local boxes behind the firewall. I could imagine hooking up hadoop 
status events to that too.

2. There's an old paper of mine, "Making Web Services that Work", in 
which I talk about deployment centric development:
http://www.hpl.hp.com/techreports/2002/HPL-2002-274.html

The idea is that right from the outset, the dev team work on a cluster 
that resembles production, the CI server builds to it automatically, 
changes get pushed out to production semi-automatically (you tag the 
version you want pushed out in SVN, the CI server does the release). The 
article is focused on services exported to third parties, not back end 
stuff, so it may not all apply to hadoop deployments.

-steve




Mime
View raw message