hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: Is HBase suitable for ...
Date Tue, 29 Apr 2008 03:41:51 GMT
My replies and questions inline.

On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote:

> Hi there,
> I'm making research to find right solution for our needs.
> We need persistent layer for groups of social network.
> These groups will have big amount of data ( ~100 GB) - users  
> profiles, their
> activities and etc.
100GB per group, or 100GB overall? How many groups?

> And all job with these entities should be make online - user can  
> ask to
> unsubscribe him, or connect another users to him.
> So we'll work with small pieces of big dataset not big data in  
> offline -
> like log parser.
> We wants to have ability to make search of different table  
> attributes and of
> course scalability and failover.
What kind of search on different table attributes do you want to do?  
There are no general purpose secondary indexes in HBase, so you  
either have to do a full- or partial-table scan or put the search  
attribute in the primary key.

As far as failover, at the moment, HBase has good recovery for region  
servers, and no recovery for the master. That's something we're  
hoping to change in the future.

> We need easy add/remove nodes in cluster without stopping entire  
> system.
You can do this, and it's not that hard.

> All of this can be done with Amazon SimpleDB but we don't want to  
> depend on
> external service. That's why we're looking for some 3d product.
> We have such candidates:
>    - HBase -
>    - CouchDb
>    - HyperTable
>    - Own bicycle
> Can you tell me is HBase will work for such system?
I think HBase can do what you need, but it'd be nice to have more  
details about what exactly you're going to do with it.

> If we have 2 or 3 data centers and we loose connection between them  
> - what
> behavior of HBase will we see ?
Is your intent to run a single HBase instance across several data  
centers? At the moment, if a regionserver is cut off from the master,  
it will kill itself. This means that if you have your master at one  
location and regionservers at another, and you lose connectivity,  
your regionservers at the other locations will shut themselves down.  
There are solutions to this we've discussed in the past. However, I  
wonder if maybe the correct solution is not to partition across data  
centers. It's not something that we've discussed at great length yet,  
so there might be an easier way to do it than I'm thinking.

> And when we restore connection in 1-2 hours - what should we expect  
> from
> HBase ?
This is where things would get sticky - how do you resolve conflicts  
in how data is being served, or worse, how it was split into regions?  
It seems inherently complicated and unpleasant.

> Thank you.

View raw message