hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
Date Fri, 12 Mar 2010 20:26:33 GMT
During the Q&A period after my presentation at HUG9, it was interesting that some in the
audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want
to follow up on some comments I made there. 

This is a little surprising, because currently the HDFS NameNode is a single point of failure
which can bring the whole service 
down. That the NameNode is a SPOF is not quite so large a concern if you have the ability
to engineer the particular server hosting the NameNode to be especially reliable. However,
when 
architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2
the reliability of any given instance is not guaranteed, only the service in the aggregate.

Running 
Hadoop on top of EC2 in production is thus not advised until there is a good hot 
fail over solution for the NameNode. 

AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/.
Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So
should you. 

Regarding a hot fail over solution for the NameNode, there is some really interesting work
ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture.



    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html


    http://issues.apache.org/jira/browse/HDFS-976

    http://issues.apache.org/jira/browse/HDFS-234
        http://issues.apache.org/jira/secure/attachment/12399656/create.png
        https://issues.apache.org/jira/browse/ZOOKEEPER-276
Once something like the above is vetted and tested, of course my above advice changes and
it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar
IaaS clouds.

In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research
and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase
EC2 scripts are a useful tool for doing such things with relative ease. 

Best regards,

   - Andy



----- Original Message ----
From: Jonathan Gray
To: hbase-user@hadoop.apache.org
Sent: Thu, March 11, 2010 3:01:22 PM
Subject: RE: [databasepro-48] HUG9

Pardon the link vomit, hopefully this comes across okay...


HBase Project Update by Jonathan Gray

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
get&target=HUG9_HBaseUpdate_JonathanGray.pdf


HBase and HDFS by Todd Lipcon of Cloudera

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf


HBase on EC2 by Andrew Purtell of Trend Micro

http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf


      


Mime
View raw message