hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <c...@powerset.com>
Subject Re: Hbase for dynamic web site?
Date Tue, 04 Dec 2007 20:16:00 GMT

Yes, this matches up well with my intuitions. In a dynamic workload with random accesses spread
across the primary key space, memcache distributed over the region servers or even the client
nodes should provide a performance benefit by caching individual hot rows, providing low latency
access to those rows. Hbase is much more targeted towards throughput. This might also serve
to reduce the number of regions kept in RAM at one time - it's a shame to waste a whole region's
worth of RAM to get at a single row if you don't need to - so this might actually reduce overall
system-wide RAM pressure.

Of course, the proof is in the pudding and it is highly dependent on the workload. I'd certainly
love to hear about anyone's experiences trying this out.


On 12/4/07 10:15 AM, "Ted Dunning" <tdunning@veoh.com> wrote:

It is conceivable that memcache would eventually have only or mostly active
objects in memory while hbase might have active pages/tablets/groups of
objects.    That might give memcache a bit of an edge.

Another thing that happens with memcache is that memcache can hold the
results of a complex join which some component views as a single object.
The database doesn't normally view these as a single object and thus may not
have as much locality.

You might view memcache as an interesting transpose from column oriented
data (hbase) to row oriented cache (memcache).  That could easily result in
interesting performance trade-offs.  Hbase should be good for scanning,
memcache might be better for single object access.

On 12/4/07 9:05 AM, "Doug Cutting" <cutting@apache.org> wrote:

>>  5.  Memory caching: Instead of pinning a whole Hbase table in RAM, I'd
>> recommend the use of memcached in front of Hbase to provide cached read
>> access.
> Memcached is useful when many nodes need to access the same data.  It
> pools and shares memory across a cluster.  In HBase, each node caches a
> different portion of a table, no?  So I don't see how memcached would
> help there.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message