hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuy...@snowcocoa.info>
Subject Re: On storing HBase data in AWS S3
Date Wed, 14 Oct 2009 17:25:26 GMT
Hi Ryan,

Thanks for correcting me, and also thanks for the great work since 0.20.

So, the facts are:

1. HBase (0.20.0 --) is capable to handle realtime queries. (e.g. stumbleupon)

2. HBase will fit in certain parts of realtime query use cases, where
transaction isolation is not so important but flexible data structure
is essential.

3. The minimum HBase configuration requires more servers than MySQL
does, but HBase can scale beyond anything MySQL can do.

4. There is a Grails plugin for HBase. (Sorry Keith, I didn't know you
already wrote it.)

One more question:

Server Layout

So I can fit HBase processes in 4 EC2 instances (1 master + 3
regions). How about 3 ZooKeepers and DFS Name Node / Secondary Name
None? Where should they go?


Server 1. DFS Name Node / ZooKeeper #1
Server 2. HBase Master / ZooKeeper #2
Server 3. DFS Secondary Name Node / HBase Master (Backup) / ZooKeeper #3
Server 4. HBase Region Server / DFS Data Node
Server 5. HBase Region Server / DFS Data Node
Server 6. HBase Region Server / DFS Data Node

So do I need 6 EC2 Extra Large instances? Or are you suggesting to
squeeze ZKs and name nodes into the same 4 instances?

Note that an Extra Large instance of Amazon EC2 is equipped with 4
virtual CPU cores and 7.9GB RAM.


Tatsuya Kawano (Mr.)
Tokyo, Japan

On Wed, Oct 14, 2009 at 5:02 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> Hey!
> I strongly disagree with Tatsaya's assessment of HBase, specifically below:
> On Wed, Oct 14, 2009 at 12:31 AM, Tatsuya Kawano
> <tatsuyaml@snowcocoa.info> wrote:
>> HI Keith,
>> On Wed, Oct 14, 2009 at 11:58 AM, Keith Thomas <keith.thomas@gmail.com> wrote:
>>> Am I correct in understanding that a farm of EC2 instances with Hadoop and
>>> HBase installed and configured individually by myself are the quickest and
>>> most effective way to progress with this effort?
>> Well, you're not wrong. To run HBase on Amazon Web Services, you
>> should use EC2 instances and configure them by yourself. Make sure you
>> pick Extra Large instances from EC2 (see:
>> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8), and you may
>> also want EBS volumes as the storage devices rather than S3. (S3 is
>> good for archiving data)
>> But...
>> Are you really sure you want to use HBase for your Grail based web
>> application on the cloud? I would definitely recommend MySQL which
>> should be more suitable for both web applications and Amazon Web
>> Services environment. HBase is not a cloud database and is currently
>> more suitable for batch processing with billions of records.
> This is not a correct assessment - first off, what does it mean to be
> a "cloud database". And secondly, HBase is suitable for storing real
> time queries, and it is a major use case that we have here at
> stumbleupon.
>> If you use HBase for this purpose, you will
>> -- loose the Object Relational Mapping support from Grails.
>> -- have to take care of database transactions and secondary indices by yourself.
> You do "lose" the transactions (if you even used them) and you may
> have to maintain secondary indexes, but you gain a flexible
> schema-less column-oriented datastore that scales far beyond anything
> mysql can do.
>> -- likely suffered from a latency of data retrieval, unless you use memcached.
> This is not correct - HBase has good caching built in, and takes full
> advantage of linux's disk buffer cache. Much more effective than MySQL
> because it is easier to get more ram across 10-20 machines (or more)
> than ram in 1-2 machines.
>> -- need more server resources than MySQL. MySQL can run on 1 EC2
>> instance, while HBase requires about 12 EC2 instances (2 for masters
>> and DFS namenodes, 5 for region servers and DFS datanodes, 5 for
>> ZooKeeper)
> Again, this is not entirely correct, you are overspecing quite a bit.
> 3 ZK nodes is fine, and they should be able to run on the "master"
> nodes. And you also reveal your misunderstanding, suggesting to the OP
> that you can run namenode on 2 hosts and that is that. The situation
> for HDFS is (unfortunately) more complicated than that.
> It is totally possible for a HBase cluster to be run on 4 EC2
> instances, 1 master, 3 datanodes.  Maybe even less, but you are
> sacrificing data reliability.
> i appreciate your enthusiasm for HBase, but please don't mislead our
> users so badly!
> Thanks,
> -ryan
>> Is there any special reason to use HBase for you web application?
>> Thanks,
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan

View raw message