hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mork0075 <mork0...@googlemail.com>
Subject Re: Why is scaling HBase much simpler then scaling a relational db?
Date Thu, 21 Aug 2008 17:12:26 GMT
Thanks a lot for all replies, this is really helpful.

As you describe it, its a problem of implementation. BigTable is 
designed to scale, there are routines to shard the data, desitribute it 
to the pool of connected servers. Could MySQL perhaps decide tomorrow to 
implement something similar or does the relational model avoids this?

As i can see up to here, scalability in HBase is for free, where there's 
no adaquate pre-implemented solution for a (free) relational database 

Fernando Padilla schrieb:
> I'm no expert, but maybe I can explain it the way I see it, maybe it 
> will resonate with other newbies like me :)  Sorry if it's long winded, 
> or boring for those who already know all this.
> BigTable and Hadoop are inherently sharded and distributed.  They are 
> architected to store the data in redundant shards across many machines. 
>  This allows you to add more capacity ( both in processing band-width as 
> well as storage capacity ), simply by adding more machines to host your 
> cluster.
> Mysql is implemented around the idea of a single database running on a 
> single machine.  This isn't inherently bad, as long as you have a 
> machine with large enough storage and processing bandwidth ( memory/cpu 
> ).  You can implement sharding over mysql (and other databases), but the 
> application will have to be hand tailored to work in such a setup. 
> You're essentially implemented a home-grown data storage system, from a 
> collection of regular databases.
> You mentioned the Mysql Cluster technology; which is an attempt to bring 
> in Mysql native level sharding and distribution of processing bandwidth. 
>  But at least from their early architecture (I have not kept up with any 
> evolution of it), it was not yet close to real sharding nor full 
> scalability.  The technology could be described as a collection of 
> Master-Master In-Memory database nodes backed by a collection of 
> persistence nodes.  So that it worked great as long as you had enough 
> ram to hold your whole database in a single node.
> Every system has their pros and cons.
> A single Mysql is simpler and solid and people have lots of experience 
> with it.  If your application allows, and with some application specific 
> architecting, you could shard mysql to some high level of scalability. 
> But this really depends on your application data and queries you do over 
> that data.  Or how much extra engineering you're willing to do (and 
> maintain) to queries that span multiple shards run efficiently.
> BigTable and Hadoop are implemented to support sharding and distributed 
> queries from the get-go, so you can easily scale out without having to 
> add or maintain more complex or homegrown software, just more hardware.
> Mork0075 wrote:
>> Thank you, but i still don't got it.
>> I've read tons of websites and papers, but there's no clear und 
>> founded answer "why use BigTable instead of relational databases".
>> MySQL Cluster seams to offer the same scalabilty and level of 
>> abstraction, whithout switching to a non relational pardigm. Lots of 
>> blog posts are highly emotional, without answering the core question:
>> "Why RDBMS don't scale and why something like BigTable do". Often you 
>> read something like this:
>> "They have also built a system called BigTable, which is a Column 
>> Oriented Database, which splits a table into columns rather than rows 
>> making is much simpler to distribute and parallelize."
>> Why?
>> Really confusing ... ;)
>> Stuart Sierra schrieb:
>>> On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 <mork0075@googlemail.com> 
>>> wrote:
>>>> Can you please explain, why someone should use HBase for horizontal
>>>> scaling instead of a relational database? One reason for me would be,
>>>> that i don't have to implement the sharding logic myself. Are there 
>>>> other?
>>> A slight tangent -- there are various tools that implement sharding
>>> over relational databases like MySQL.  Two that I know of are
>>> DBSlayer,
>>> http://code.nytimes.com/projects/dbslayer
>>> and MySQL Proxy,
>>> http://forge.mysql.com/wiki/MySQL_Proxy
>>> I don't know of any formal comparisons between sharding traditional
>>> database servers and distributed databases like HBase.
>>> -Stuart

View raw message