hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Sharma <sanjay.sha...@impetus.co.in>
Subject RE: On storing HBase data in AWS S3
Date Wed, 14 Oct 2009 09:44:03 GMT
Hi Keith,

"I am looking for the easiest way to bring up an HBase and Hadoop environment as the persistence
mechanism for a Grails based web application."
I think Ryan has already cleared the doubts around using HBase for live applications.
You might start looking at a new grails plugin- http://grails.org/plugin/gorm-hbase to get
a head start.

-sanjay

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]
Sent: Wednesday, October 14, 2009 1:32 PM
To: hbase-user@hadoop.apache.org
Subject: Re: On storing HBase data in AWS S3

Hey!

I strongly disagree with Tatsaya's assessment of HBase, specifically below:


On Wed, Oct 14, 2009 at 12:31 AM, Tatsuya Kawano
<tatsuyaml@snowcocoa.info> wrote:
> HI Keith,
>
> On Wed, Oct 14, 2009 at 11:58 AM, Keith Thomas <keith.thomas@gmail.com> wrote:
>> Am I correct in understanding that a farm of EC2 instances with Hadoop and
>> HBase installed and configured individually by myself are the quickest and
>> most effective way to progress with this effort?
>
> Well, you're not wrong. To run HBase on Amazon Web Services, you
> should use EC2 instances and configure them by yourself. Make sure you
> pick Extra Large instances from EC2 (see:
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8), and you may
> also want EBS volumes as the storage devices rather than S3. (S3 is
> good for archiving data)
>
>
> But...
>
> Are you really sure you want to use HBase for your Grail based web
> application on the cloud? I would definitely recommend MySQL which
> should be more suitable for both web applications and Amazon Web
> Services environment. HBase is not a cloud database and is currently
> more suitable for batch processing with billions of records.

This is not a correct assessment - first off, what does it mean to be
a "cloud database". And secondly, HBase is suitable for storing real
time queries, and it is a major use case that we have here at
stumbleupon.

>
> If you use HBase for this purpose, you will
>
> -- loose the Object Relational Mapping support from Grails.
> -- have to take care of database transactions and secondary indices by yourself.

You do "lose" the transactions (if you even used them) and you may
have to maintain secondary indexes, but you gain a flexible
schema-less column-oriented datastore that scales far beyond anything
mysql can do.

> -- likely suffered from a latency of data retrieval, unless you use memcached.

This is not correct - HBase has good caching built in, and takes full
advantage of linux's disk buffer cache. Much more effective than MySQL
because it is easier to get more ram across 10-20 machines (or more)
than ram in 1-2 machines.


> -- need more server resources than MySQL. MySQL can run on 1 EC2
> instance, while HBase requires about 12 EC2 instances (2 for masters
> and DFS namenodes, 5 for region servers and DFS datanodes, 5 for
> ZooKeeper)

Again, this is not entirely correct, you are overspecing quite a bit.
3 ZK nodes is fine, and they should be able to run on the "master"
nodes. And you also reveal your misunderstanding, suggesting to the OP
that you can run namenode on 2 hosts and that is that. The situation
for HDFS is (unfortunately) more complicated than that.

It is totally possible for a HBase cluster to be run on 4 EC2
instances, 1 master, 3 datanodes.  Maybe even less, but you are
sacrificing data reliability.


i appreciate your enthusiasm for HBase, but please don't mislead our
users so badly!

Thanks,
-ryan

>
>
> Is there any special reason to use HBase for you web application?
>
> Thanks,
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>

Follow us on Twitter- https://twitter.com/impetuscalling.

*Impetus Celebrates Green Diwali.

NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

Mime
View raw message