hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagane Sundar <jag...@apache.org>
Subject Re: Hadoop as a Big Data app for the cloud
Date Thu, 06 Oct 2011 18:32:23 GMT
On Thu, Oct 6, 2011 at 11:21 AM, Daniel Sikar <dsikar@gmail.com> wrote:

> > If you buy the argument that EBS is resilient storage
> Just for the record, data has been lost in EBS.
Right. That's why I qualified the statement with 'if you buy the

>From Amazon's website:
'The durability of your volume depends both on the size of your volume and
the percentage of the data that has changed since your last snapshot. As an
example, volumes that operate with 20 GB or less of modified data since
their most recent Amazon EBS snapshot can expect an annual failure rate
(AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the
volume. This compares with commodity hard disks that will typically fail
with an AFR of around 4%, making EBS volumes 10 times more reliable than
typical commodity disk drives.'

For Hadoop a good strategy may be to use ephemeral storage for MR temp space
and EBS for HDFS data. If the data was poured into HDFS using some ETL
processing, and if the origin data is still in S3, that's all the resiliency
you need.

Of course, it is unfortunate that openstack and other home brew clouds do
not have an EBS equivalent technology. Just about now, a HDFS friendly EBS
equivalent storage technology for openstack sounds like a good idea.

Finally, note that I had not mentioned the cost of accessing EBS volumes. It
costs ten cents for every million I/O requests. How the heck do you project
that cost???


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message