hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
Date Sat, 13 Mar 2010 12:36:17 GMT
I agree that running 24/7 hbase servers on ec2 is not advisable. But I
need some suggestions for running mapred-jobs ( in batches ) followed
by updating the results on an existing hbase server.

Is it advisable to use EBS drives ( attached to each different  slave
)  and have them configured as HDSF Storage Directory ?  And then use
hbase on top of it. I am assuming that ec2 clusters can be shutdown
and restarted ( at a later point of time ) to use the same hbase.

-Prasen

On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <apurtell@apache.org> wrote:
> During the Q&A period after my presentation at HUG9, it was interesting that some
in the audience indicated they are running production Hadoop and/or HBase clusters on EC2.
I want to follow up on some comments I made there.
>
> This is a little surprising, because currently the HDFS NameNode is a single point of
failure which can bring the whole service
> down. That the NameNode is a SPOF is not quite so large a concern if you have the ability
to engineer the particular server hosting the NameNode to be especially reliable. However,
when
> architecting services on EC2, you must be mindful of its guarantees, or lack thereof.
On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate.
>
> Running
> Hadoop on top of EC2 in production is thus not advised until there is a good hot
> fail over solution for the NameNode.
>
> AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/.
Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So
should you.
>
> Regarding a hot fail over solution for the NameNode, there is some really interesting
work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the
architecture.
>
>
>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
>
>
>    http://issues.apache.org/jira/browse/HDFS-976
>
>    http://issues.apache.org/jira/browse/HDFS-234
>        http://issues.apache.org/jira/secure/attachment/12399656/create.png
>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
> Once something like the above is vetted and tested, of course my above advice changes
and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and
similar IaaS clouds.
>
> In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research
and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase
EC2 scripts are a useful tool for doing such things with relative ease.
>
> Best regards,
>
>   - Andy
>
>
>
> ----- Original Message ----
> From: Jonathan Gray
> To: hbase-user@hadoop.apache.org
> Sent: Thu, March 11, 2010 3:01:22 PM
> Subject: RE: [databasepro-48] HUG9
>
> Pardon the link vomit, hopefully this comes across okay...
>
>
> HBase Project Update by Jonathan Gray
>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
>
>
> HBase and HDFS by Todd Lipcon of Cloudera
>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
>
>
> HBase on EC2 by Andrew Purtell of Trend Micro
>
> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>
>
>
>
>

Mime
View raw message