hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan LeCompte" <lecom...@gmail.com>
Subject Re: Hadoop & EC2
Date Tue, 02 Sep 2008 12:22:31 GMT
Hi Tim,

Are you mostly just processing/parsing textual log files? How many
maps/reduces did you configure in your hadoop-ec2-env.sh file? How
many did you configure in your JobConf? Just trying to get an idea of
what to expect in terms of performance. I'm noticing that it takes
about 16 minutes to transfer about 15GB of textual uncompressed data
from S3 into HDFS after the cluster has started with 15 nodes. I was
expecting this to take a shorter amount of time, but maybe I'm
incorrect in my assumptions. I am also noticing that it takes about 15
minutes to parse through the 15GB of data with a 15 node cluster.


On Tue, Sep 2, 2008 at 3:29 AM, tim robertson <timrobertson100@gmail.com> wrote:
> I have been processing only 100s GBs on EC2, not 1000's and using 20
> nodes and really only in exploration and testing phase right now.
> On Tue, Sep 2, 2008 at 8:44 AM, Andrew Hitchcock <adpowers@gmail.com> wrote:
>> Hi Ryan,
>> Just a heads up, if you require more than the 20 node limit, Amazon
>> provides a form to request a higher limit:
>> http://www.amazon.com/gp/html-forms-controller/ec2-request
>> Andrew
>> On Mon, Sep 1, 2008 at 10:43 PM, Ryan LeCompte <lecompte@gmail.com> wrote:
>>> Hello all,
>>> I'm curious to see how many people are using EC2 to execute their
>>> Hadoop cluster and map/reduce programs, and how many are using
>>> home-grown datacenters. It seems like the 20 node limit with EC2 is a
>>> bit crippling when one wants to process many gigabytes of data. Has
>>> anyone found this to be the case? How much data are people processing
>>> with their 20 node limit on EC2? Curious what the thoughts are...
>>> Thanks,
>>> Ryan

View raw message