hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <c...@powerset.com>
Subject RE: Hbase for dynamic web site?
Date Tue, 04 Dec 2007 05:52:51 GMT
I'd say that the current state of Hbase is more suited to offline processing than to online
serving duties, but I do envision that the roadmap for Hbase could extend to cover those capabilities.
Currently, however, Michael and Jim are spending most of their time stabilizing the core of
the system and working on basic performance bottlenecks, especially as several large scale
Hbase installations are starting to pop up and file issues.

Here are some of the things that I think would move Hbase in the right direction for online

 1.  Atomic appends for a single writer (HADOOP-1700): We have to have atomic appends for
the commit log or durability is not guaranteed. This is a pressing issue in any case for any
offline processing use case that requires a 100% guarantee on durability.
 2.  Real-time master failover: Need to make sure there is zero downtime on failure of the
HDFS master and the Hbase master. Perhaps the Zookeeper project will provide the key part
of the solution although I don't have much visibility into where Zookeeper stands and what
its roadmap looks like. Can anyone say anything more?
 3.  More performance work: Michael did some performance measurements a while back that seemed
to indicate a lot of time spent back-and-forth in RPC. We're exploring Thrift as a lighter-weight
RPC mechanism, but there are probably other things to be done to reduce this cost. More analysis
and measurement would be helpful.
 4.  Tighter integration between HDFS and Hbase: Preference for running the region server
on the same node as one of the replicas of the underlying tables would lower latency.
 5.  Memory caching: Instead of pinning a whole Hbase table in RAM, I'd recommend the use
of memcached in front of Hbase to provide cached read access.

Once these things are in place, Hbase could provide a reasonably performant large-scale online
serving system. The main advantages of such a system would be its flexible schema, automatic
repartitioning, and centralized administration, especially when compared with a system based
around many separate MySQL instances with memcached in front of them. It would not have full
ACID properties but there are many interesting applications that don't require strong guarantees
in those areas.

Anyone who'd like to start tackling any of the above items should feel free to chime in here
or jump on the Hbase IRC - more contributors always welcome!

Chad Walters
Search Architect

> Date: Fri, 30 Nov 2007 09:50:19 -0800
> Subject: Re: Hbase for dynamic web site?
> From: tdunning@veoh.com
> To: hadoop-user@lucene.apache.org
> Are you already using memcache and related approaches?
> On 11/30/07 9:46 AM, "Mike Perkowitz"  wrote:
>> Hello! We have a web site currently built on linux/apache/mysql/php. Most
>> pages do some mysql queries and then stuff the results into php/html
>> templates. We've been hitting the limits of what our database can handle,
>> and what we can do in realtime for the site. Our plan is to move our data
>> over to Hbase, precomputing as much as we can (some queries we currently do
>> with joins in mysql, for example). Our pages would then be pulling rows from
>> Hbase to stuff into templates.
>> We're still working on getting Hbase working with the amount of data we want
>> to be able to handle, so haven't yet been able to test it for performance.
>> Is anyone else using Hbase in this way, and what has been your experience
>> with realtime performance? I haven't really seen examples of people using
>> Hbase this way - another approach would be for us to use
>> Hadoop/Hbase/mapreduce for computation then put results back into mysql or
>> whatever for realtime access. Any experience or suggestions would be
>> appreciated!
>> Thanks,
>> Mike

Connect and share in new ways with Windows Live.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message