hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ingles <p...@oobaloo.co.uk>
Subject Re: Which instance type on Amazon EC2?
Date Tue, 29 Sep 2009 18:27:18 GMT

I don't have any real benchmarks or testing to speak of specifically  
for the performance benefits of a larger instance size. However, we  
have played around a little and for our work (a form of document  
clustering) the benefits of a larger instance were far outweighed by  
having more of the less powerful instances. During the early days of  
our experiments with Hadoop and EC2, this was by far and away the most  
surprising thing (although in retrospect I guess it's no so strange!)

Not sure it answers your question, but food for thought hopefully.


On 29 Sep 2009, at 18:33, Brian Bockelman wrote:

> Hey Kevin,
> From seeing presentations from the HEP field (totally unrelated to  
> Hadoop), I've seen folks claim the large instance is more than 4x  
> better than the small, and less than 2x slower than extra-large.   
> I.e., it provided that application the best bang for its buck.
> In other words, you're not completely crazy for believing this, and  
> other people have reported seeing non-linear differences between the  
> difference instance types.  I suspect the "best" will depend highly  
> on what your app is doing.
> Brian
> On Sep 29, 2009, at 12:19 PM, Kevin Peterson wrote:
>> Has anyone done any extensive testing of what instance types on  
>> Amazon EC2
>> give you the most bang for the buck?
>> Given the normal Hadoop recommendations of beefy machines, I would  
>> expect
>> the best performance from the extra-large, but our testing showed  
>> otherwise.
>> We did some rough testing while we were just getting started with  
>> like a 10
>> node cluster, and we found that the extra large instance doesn't  
>> come close
>> to twice the actual performance of the large instance (pricing at  
>> $0.80 and
>> $0.40). My rationalization is that some of the resources are  
>> shared, and the
>> extra-large instance corresponds to the actual hardware, while the  
>> large
>> instance sometimes gets to take advantage of IO and network  
>> bandwidth beyond
>> 50% when the other tenant isn't doing much.
>> I'm revisiting our config because we're deploying HBase soon, and  
>> I'm not
>> sure whether I would be better off going to the extra-large  
>> instances so
>> that I can co-locate the tasktrackers and the region servers on the  
>> same
>> nodes, or if I should stick with large instances and put hbase on  
>> separate
>> servers. Mostly I'm wondering if my results were a fluke.

View raw message