hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop/Elastic MR on AWS
Date Wed, 15 Dec 2010 17:28:59 GMT
On 09/12/10 18:57, Aaron Eng wrote:
> Pros:
> - Easier to build out and tear down clusters vs. using physical machines in
> a lab
> - Easier to scale up and scale down a cluster as needed
> Cons:
> - Reliability.  In my experience I've had machines die, had machines fail to
> start up, had network outages between Amazon instances, etc.  These problems
> have occurred at a far more significant rate than any physical lab I have
> ever administered.
> - Money. You get charged for problems with their system.  Need to add
> storage space to a node?  That means renting space from EBS which you then
> need to actually spend time formatting to ext3 so you can use it with
> Hadoop.  So every time you want to use storage, you're paying Amazon to
> format it because you can't tell EBS that you want an ext3 volume.
> - Visibility.  Amazon loves to report that all their services are working
> properly on their website, meanwhile, the reality is that they only report
> issues if they are extremely major.  Just yesterday they reported "increased
> latency" on their us-east-1 region.  In reality, "increased latency" means
>> 50% of my Amazon API calls were timing out, I could not create new
> instances and for about 2 hours I could not destroy the instances I had
> already spun up.  Hows that for ya?  Paying them for machines that they
> won't let me terminate...

that's the harsh reality of all VMs. you need to monitor and stamp on 
things that misbehave. The nice thing is: it's easy to do this, just get 
HTTP status pages and kill any VM

This is not a fault of EC2: any VM infra has this feature. You can't 
control where your VMs come up, you are penalised by other cpu-heavy 
machines on the same server, amazon throttle the smaller machines a bit.

But you
  -don't pay for cluster time you don't need
  -don't pay for ingress/egress for data you generate in the vendor's 
infrastructure (just storage)
  -can be very agile with cluster size.

I have a talk on this topic for the curious, discussing a UI that is a 
bit more agile, but even there we deploy agents to every node to keep an 
eye on the state of the cluster.


Hadoop is designed to work well in a large-scale static cluster: fixed 
machines, with the reactions to client to server failure failure: spin 
and those of servers -blacklist clients- being the right ones to leave 
ops in control. In a virtual world you want the clients to see (somehow) 
if the master nodes have moved, you want the servers to kill the 
misbehaving VMs to save money, and then create new ones.


View raw message