hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject RE: Why is scaling HBase much simpler then scaling a relational db?
Date Wed, 27 Aug 2008 16:06:34 GMT
Discussion inline.

> You example with the friends makes perfectly sense. Can you imagine a
> scenario where storing the data in column oriented instead of row
> oriented db (so if you will an counterexample) causes such a huge
> performance mismatch, like the friends one in row/column comparison?

There's quite a few advantages in a typical row-oriented database.  For one, normalization
is more space efficient, though this is increasingly less important, especially in distributed
file systems and with drives as cheap as they are.

A big difference you'll find between organizations running large relational clusters compared
to large hbase/hadoop clusters is the type of hardware used.  Relational databases can (and
should) be memory hogs but worst of all they need fast disk i/o.  Large 15k rpm SAS RAID-10
arrays are often used costing tens of thousands of dollars.  Each server might have 15 drives
or more, 8 or more cores, and 32 gigs of ram.

In contrast, HBase/Hadoop clusters are most often made with "commodity" hardware, a decent
processor (quad core xeon) and 4+ gb of memory in a 1U runs about $1000.  The i/o matters
much less with this architecture and that's where most cost can be associated when purchasing
relational database hardware.

One tradeoff of course is that you will have far more actual machines with HBase/Hadoop but
a node going down is not a big deal whereas it can be a much bigger emergency if the number
of total nodes is low.

>From a performance standpoint, there's no contest for relational databases and their ability
to index on different columns and randomly access records.  If you have dense data and you
want to query it randomly or with ordering and limiting/offsetting (in soft realtime), you're
going to be in for some tough times with HBase.  This is definitely where relational DBs shine.

In HBase things like this require lots of denormalization and cleverness (though many times
table scans are the only way), or in my case writing a separate logic/application/caching
layer on top of HBase.  For us, HBase is a store that backs our caches and indexes.  Those
are allowed to fail as they can be fully recovered from HBase and also implement LRU-like
algorithms as our total dataset is on the order of terabytes.  Some of this caching can be
compared to Memcache for SQL, but indexing/joining is unique.

So it really depends on what you want to use it for.  If you're thinking about it, you probably
have some kind of scale issues.  Either in the size of your dataset, your need to process
a very large dataset in batch and in parallel, or strong need for replication/fault-tolerance.
 Though it's also good to just explore because many of us think this is the direction things
are going :)

And things will get better!  Currently, there is already work being done with indexing.  I
have personally created join/merge logic that will be contributed in the future.  But the
larger issue at hand is the relatively poor random read performance of HBase compared to that
of relational databases.  This is inextricably linked to HDFS, which is most often tuned for
batch processing rather than random access (a vast majority of Hadoop users are using it in
this way).  However, improvements are definitely being made within both HBase and Hadoop.
 You can follow the performance statistics here:  http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation

Between 0.17 and 0.18, random read performance jumped 60%.  So things are definitely getting
better, there's many people working hard at both projects.  I suspect we will also see in-memory
tables in the next few months which can help if you have only certain tables that need fast
random access.

I think I got a bit off topic but hopefully you find something useful out of it...

> Can you please provide an example of "good de-normalization" in HBase
> and how its held consitent (in your friends example in a relational db,
> there would be a cascadingDelete)? As i think of the users table: if i
> delete an user with the userid='123', then if have to walk through all
> of the other users column-family "friends" to guranty consitency?! Is
> de-normalization in HBase only used to avoid joins? Our webapp doenst
> use joins at the moment anyway.

You lose any concept of foreign keys.  You have a primary key, that's it.  No secondary keys/indexes,
no foreign keys.

It's the responsibility of your application to handle something like deleting a friend and
cascading to the friendships.  Again, typical small web apps are far simpler to write using
SQL, you become responsible for some of the things that were once handled for you.

When building large scale web applications, this is what you want.  Control.  Like programming
in C versus Java (no flame war intended, I love them both), that control comes at the cost
of complexity.  But when you need to scale in a relational database you so often end up hacking
away at it, trying to trick it to do what you want rather than letting it solve its own problems.
 And that's where they shine, solving your problems automatically via SQL/foreign keys/query
planning/etc.  Eventually the problems get too hard and things get very slow.

Another example of "good denormalization" would be something like storing a users "favorite
pages".  If we want to query this data in two ways: for a given user, all of his favorites.
 Or, for a given favorite, all of the users who have it as a favorite.  Relational database
would probably have tables for users, favorites, and userfavorites.  Each link would be stored
in one row in the userfavorites table.  We would have indexes on both 'userid' and 'favoriteid'
and could thus query it in both ways described above.  In HBase we'd probably put a column
in both the users table and the favorites table, there would be no link table.

That would be a very efficient query in both architectures, with relational performing better
much better with small datasets but less so with a large dataset.

Now asking for the favorites of these 10 users.  That starts to get tricky in HBase and will
undoubtedly suffer worse from random reading.  The flexibility of SQL allows us to just ask
the database for the answer to that question.  In a small dataset it will come up with a decent
solution, and return the results to you in a matter of milliseconds.  Now let's make that
userfavorites table a few billion rows, and the number of users you're asking for a couple
thousand.  The query planner will come up with something but things will fall down and it
will end up taking forever.  The worst problem will be in the index bloat.  Insertions to
this link table will start to take a very long time.  HBase will perform virtually the same
as it did on the small table, if not better because of superior region distribution. 

And again, slow disks are cheap, fast disks are expensive.

> As you describe it, its a problem of implementation. BigTable is
> designed to scale, there are routines to shard the data, desitribute
> it to the pool of connected servers. Could MySQL perhaps decide
> tomorrow to implement something similar or does the relational model
> avoids this?

Distributed transactions and locking will always incur a cost.  Therefore multi-master approaches
will always contain at least some significant performance hit from distribution.  Slave replication
can be done very efficiently and effectively but do not really address all the scale issues.
 Yes, they could and are doing similar things.  But the faster it performs, you more you will
be sacrificing in terms of what you can take for granted with single-node MySQL. 

One fundamental difference here is in storage.  BigTable stores sparse data by row then by
column without storing nulls.  MySQL doesn't do that.  You can create link tables to store
sparse matrix data but at the costs described above.  And every column in your schema is stored
in every row, regardless of whether you use it or not.  And that's not just storage on disk,
each time a row is fetched, the entire row will be pulled into memory.

Also, storing binary data in SQL databases (Postres at least) can be a real nightmare to deal
with.  Everything in HBase is byte[] so you can do whatever you want.  An aside, but important
to some nonetheless.

Another is the ability to Map-Reduce.  I have to admit I know nothing about MR and relational
databases but it makes little sense to me in a non-distributed file system.  I have actually
heard of someone attempting to run MySQL on top of HDFS to get distribution, fault-tolerance,
and MR.  Will have to do some more searching.

Guess that's my rant for the week.  Hope that provides more clarity than confusion.

Jonathan Gray

> Jonathan Gray schrieb:
>> A few very big differences...
>> - HBase/BigTable don't have "transactions" in the same way that a
>> relational database does.  While it is possible (and was just recently
>> implemented for HBase, see HBASE-669) it is not at the core of this
>> design.  A major bottleneck of distributed multi-master relational
>> databases is distributed transactions/locks.
>> - There's a very big difference between storage of
>> relational/row-oriented databases and column-oriented databases.  For
>> example, if I have a table of 'users' and I need to store friendships
>> between these users... In a relational database my design is something
>> like:
>> Table: users(pkey = userid)
>> Table: friendships(userid,friendid,...) which contains one (or maybe
>> two depending on how it's impelemented) row for each friendship.
>> In order to lookup a given users friend, SELECT * FROM friendships
>> WHERE userid = 'myid';
>> This query would use an index on the friendships table to retrieve all
>> the necessary rows.  Depending on the relational database you might
>> also be fetching each and every row (entirely) off of disk to be
>> read.  In a sharded relational database, this would require hitting
>> every node to get whichever friendships were stored on that node. 
>> There's lots of room for optimizations here but any way you slice it,
>> you're likely pulling non-sequential blocks off disk.  When you add in
>> the overhead of ACID transactions this can get slow.
>> The cost of this relational query continues to increase as a user adds
>> more friends.  You also begin to have practical limits.  If I have
>> millions of users, each with many thousands of potential friends, the
>> size of these indexes grow exponentially and things get nasty
>> quickly.  Rather than friendships, imagine I'm storing activity logs
>> of actions taken by users.
>> In a column-oriented database these things scale continuously with
>> minimal difference between 10 users and 10,000,000 users, 10
>> friendships and 10,000 friendships.
>> Rather than a friendships table, you could just have a friendships
>> column family in the users table.  Each column in that family would
>> contain the ID of a friend.  The value could store anything else you
>> would have stored in the friendships table in the relational model. 
>> As column families are stored together/sequentially on a per-row
>> basis, reading a user with 1 friend versus a user with 10,000 friends
>> is virtually the same.  The biggest difference is just in the shipping
>> of this information across the network which is unavoidable.  In this
>> system a user could have 10,000,000 friends.  In a relational database
>> the size of the friendship table would grow massively and the indexes
>> would be out of control.
>> It's certainly possible to make relational databases "scale".  What
>> that is about is usually massive optimizations, manual sharding, being
>> very clever about how you query things, and often de-normalizing. 
>> Index bloat and table bloat can thrash a relational db.
>> In HBase, de-normalizing is usually a good thing.  Storage space is
>> often considered free (not a large cost associated with storing
>> something in multiple places).  In a relational database this is not
>> so much the case.
>> In HBase, the sharding and distribution come for free out of the box. 
>> With a relational database you must jump through hoops and often end
>> up implementing your own sharding/distributed hashing algorithms so
>> you can distribute across machines.
>> Yes, you lose the relational primitives.  Sometimes you wish you could
>> do a simple join.  But if you get in the right mindset, you learn how
>> to put your data into this new data model.  And in the end the payoffs
>> are huge.
>> In addition to all this, with HBase/Hadoop you get MapReduce.  It's
>> feasible to implement something like that on top of a distributed
>> relational database but again the complexity is enormous.  With
>> HBase/Hadoop it's a built-in part of the system, a system which is
>> very intelligent at keeping logic close to data, etc.
>> To directly answer your question, it's "simpler" to scale HBase
>> because the scaling comes for free out of the box.  You get automatic
>> sharding/distribution of storage and queries.  There's nothing simpler
>> than that.  Distributing a relational database is never simple.
>> I hope this starts to shed some light on what the differences are.
>> Jonathan Gray
>> -----Original Message-----
>> From: Mork0075 [mailto:mork0075@googlemail.com] Sent: Thursday, August
>> 21, 2008 8:48 AM
>> To: core-user@hadoop.apache.org; hbase-user@hadoop.apache.org
>> Subject: Re: Why is scaling HBase much simpler then scaling a
>> relational db?
>> Thank you, but i still don't got it.
>> I've read tons of websites and papers, but there's no clear und
>> founded answer "why use BigTable instead of relational databases".
>> MySQL Cluster seams to offer the same scalabilty and level of
>> abstraction, whithout switching to a non relational pardigm. Lots of
>> blog posts are highly emotional, without answering the core question:
>> "Why RDBMS don't scale and why something like BigTable do". Often you
>> read something like this:
>> "They have also built a system called BigTable, which is a Column
>> Oriented Database, which splits a table into columns rather than rows
>> making is much simpler to distribute and parallelize."
>> Why?
>> Really confusing ... ;)
>> Stuart Sierra schrieb:
>>> On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 <mork0075@googlemail.com>
>>> wrote:
>>>> Can you please explain, why someone should use HBase for horizontal
>>>> scaling instead of a relational database? One reason for me would be,
>>>> that i don't have to implement the sharding logic myself. Are there
>>>> other?
>>> A slight tangent -- there are various tools that implement sharding
>>> over relational databases like MySQL.  Two that I know of are
>>> DBSlayer,
>>> http://code.nytimes.com/projects/dbslayer
>>> and MySQL Proxy,
>>> http://forge.mysql.com/wiki/MySQL_Proxy
>>> I don't know of any formal comparisons between sharding traditional
>>> database servers and distributed databases like HBase.
>>> -Stuart

View raw message