hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mork0075 <mork0...@googlemail.com>
Subject Re: Why is scaling HBase much simpler then scaling a relational db?
Date Mon, 18 Aug 2008 06:50:32 GMT
I've read some papers and tutorials this week and now got some conrete 

(1) Sharding is also available in common relational systems. Often it is 
discribed that you need an application layer for the (shards) 
federation. I unterstand HBase like this layer, which implements the 
whole sharding thing. HBase distributes the shards (regions) over the 
region servers if a certain size increases.

Wouldn't it be more practicabel to distribute the regions by load and 
not by size? So if a region gets more requests (load), it would be 
splittet and ditributed?

(2) What happens if the ressources of one machine are not able to handle 
the load of the most used region (which is still splitted to the 
maximum) - then you need replication, so that one region is served by 
multiple region servers. Is this feature planed?

(3) I don't understand at the moment, how restoring after crash works. 
On each commit, a snippet of data is written into the commit log which 
is distributed/replicated over all region servers? So how many servers 
may die whithout loosing any data?

(4) How is backup of the data realised? In a perhaps MySQL environment 
you run the Admin console and perform a backup. Is this also available 
in an HBase enviroment?

Thanks a lot for answering these questions :)

Steve Loughran schrieb:
> Mork0075 wrote:
>> Hello,
>> can someone please explain oder point me to some documentation or 
>> papers, where i can read well proven facts, why scaling a relational 
>> db is so hard and scaling a document oriented db isnt?
> http://labs.google.com/papers/bigtable.html
> relational dbs are great for having lots of structured data, where you 
> can run SELECT operations, do O/R mapping to make them look like 
> objects, etc. Its one thing to back up, and you get transactions. 
> They're bad places to store binary data, or, say, billions and billions 
> of rows of web server log data
> by relaxing some of the expectations of a relational db, things like 
> bigtable, hbase and others can scale well, but as they have relaxed the 
> rules, may not do everything you want.
>> So perhaps if i got lots of requests to my relational db, i would 
>> duplicate it to several servers and partition the requests. So why 
>> this doenst scale and why HBase for instance could manage this?
> That's called sharding/horizontal partitioning.
> It works well if you can partition all your data so that different users 
> can go on different places. though once you've done that. you cant think 
> about JOIN-ing stuff from multiple machines.
> The alternative option (which is apparently common in places like 
> myspace and imdb) is to or have one r/w master and a number of read only 
> slaves. All changes go into the master, the slaves pick the changes later
>> I'am really new to this topic and would like to dive in deeper.
> check out the articles in http://highscalability.com/
> -steve

View raw message