hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Zeyliger <phi...@cloudera.com>
Subject Re: Time to build my own cluster - advice?
Date Thu, 05 Nov 2009 17:16:51 GMT
It's not too bad.  There are some notes at
http://wiki.apache.org/hadoop/AmazonEC2, and some code in common's
contrib directory:
http://svn.apache.org/repos/asf/hadoop/common/trunk/src/contrib/ec2/

Cloudera (my employer) publishes some scripts at
http://archive.cloudera.com/docs/ec2.html that make it quite easy to
get started.  It's a set of python scripts that, given the appropriate
credentials, start and stop a cluster.  There are some hooks
(http://archive.cloudera.com/docs/_customization.html) to trigger
installation of custom packages.  What's going on underneath the
scenes is that AMIs are being started, and they read from their
"user_data" parameter a script which gets invoked at boot time.  This
script knows enough to configure the cluster, and is easily
customizable.

-- Philip

On Thu, Nov 5, 2009 at 9:09 AM, Mark Kerzner <markkerzner@gmail.com> wrote:
> Hi,
>
> so far I've been using Amazon MapReduce. However, my app uses a growing
> number of Linux packages. I have been installing them on the fly, in the
> Mapper.configure(), but with OpenOffice this is hard, and I don't get a
> service connection even after local install.
>
> Therefore, my question is on the advice in creating my own Hadoop cluster
> out of EC2 machines. Are there instructions? How hard is it? What are best
> practices?
>
> Thank you,
> Mark
>

Mime
View raw message