hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Why is scaling HBase much simpler then scaling a relational db?
Date Thu, 21 Aug 2008 16:11:39 GMT
Comments inline:
> -----Original Message-----
> From: Mork0075 [mailto:mork0075@googlemail.com]
> Sent: Thursday, August 21, 2008 8:48 AM
> To: core-user@hadoop.apache.org; hbase-user@hadoop.apache.org
> Subject: Re: Why is scaling HBase much simpler then scaling a relational db?
> Thank you, but i still don't got it.
> I've read tons of websites and papers, but there's no clear und founded
> answer "why use BigTable instead of relational databases".
> MySQL Cluster seams to offer the same scalabilty and level of
> abstraction, whithout switching to a non relational pardigm. Lots of
> blog posts are highly emotional, without answering the core question:

I think you'd find that when the size of your data approaches 10-100 TB, you'd find that relational
databases run out of gas. Further, as your data grows, with a relational database you need
to add another shard, redistribute your data and make the client know that rows are split
over n+1 shards instead of n.

Bigtable has shown that it can scale to 100s of TB of data (or even more - I don't have any
recent numbers on the largest Bigtable instance. All this can be done by just bringing up
a new server and data is redistributed automatically, and client applications do not need
to be changed.

> "Why RDBMS don't scale and why something like BigTable do". Often you
> read something like this:
> "They have also built a system called BigTable, which is a Column
> Oriented Database, which splits a table into columns rather than rows
> making is much simpler to distribute and parallelize."
> Why?

In a column oriented data store, nulls are free. Not so for a row oriented database, where
it must allocate space for a column even if the current value is null.

View raw message