hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: Why can't Hadoop be used for online applications ?
Date Sun, 14 Sep 2008 23:07:04 GMT
We use HBase at Wikia for serving up decorations to our search results. 
    For example user annotations and added urls.  Performance is 
excellent on the query side, although we are only serving 50-100K 
queries a day currently.

HBase is a column oriented datastore.  So while you will be able to 
store large amounts of data, same as you would in DFS, and you will have 
rows in the logical sense, but you won't have things such as joins or 
other SQL type structures.

To answer your original question, why is hadoop not recommended for 
online applications?  It is because of the network overhead of 
contacting the DFS to retrieve file data or the overhead of running a 
mapreduce task which can be from minutes to days.  That said we do have 
a program that requests a file and byte location from hbase and then 
goes to dfs using the file and byte location and pulls a cached webpage 
from huge compressed and gzipped files.  And the response time for this 
is 150ms.  So DFS when handled correctly can be used for *some* things 
in an online application, but I don't know how well that will scale.


Camilo Gonzalez wrote:
> Hey James,
> Yes, its clear for me that MySQL/Oracle/etc have a different API. I mean in
> terms of Data Retrieval and possibly performance.
> What I want to see is if there are maybe some performance benchmarks about
> data retrieval/update, for example, retrieving a set of rows from a big
> table (searching by primary key of course).
> The thing you say makes sense, HBase could be a very good choice for huge
> datasets, but it depends on the problem as I've been reading.
> Thanks for your response!
> Camilo.
> On Sat, Sep 13, 2008 at 10:46 AM, James Moore <jamesthepiper@gmail.com>wrote:
>> On Fri, Sep 12, 2008 at 12:28 PM, Ryan LeCompte <lecompte@gmail.com>
>> wrote:
>>> Hey Camilo,
>>> HBase is not meant to be a replacement for MySQL or a traditional
>>> RDBMS (HBase is not transaction, for instance). I'd recommend reading
>>> the following article that describes what HBase/Bigtable really is:
>>> http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
>> I think I disagree with you, but it depends on exactly what you mean -
>> bigtable/HBase/CouchDB/etc are not replacements for MySQL/Oracle/etc
>> if by "replacement" you mean "something that provides a similar API."
>> bigtable/HBase/CouchDB/etc are replacements for MySQL/Oracle/etc if
>> you mean "they store and retrieve (large) quantities of data."
>> I certainly think that using an hbase-like solution instead of
>> something using SQL is going to be a popular choice in the near
>> future.  Obviously, it depends on the kinds of things you work on.
>> Most of the time, I'm working on code where we're just using SQL as a
>> second-rate way to serialize and deserialize objects.
>> --
>> James Moore | james@restphone.com
>> Ruby and Ruby on Rails consulting
>> blog.restphone.com

View raw message