hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Hypertable claiming upto >900% random-read throughput vs HBase
Date Wed, 15 Dec 2010 20:27:55 GMT
Why do not you use off heap memory for this purpose? If its block cache (all blocks are of
equal sizes)
alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java.

I think something like open source version of Terracotta BigMemory is a good candidate for
Apache project. I see at least  several large Hadoops : HBase, HDFS DataNodes, TaskTrackers
and NameNode who suffer a lot from GC timeouts.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

From: Ryan Rawson [ryanobjc@gmail.com]
Sent: Wednesday, December 15, 2010 11:52 AM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

The malloc thing was pointing out that we have to contend with Xmx and
GC.  So it makes it harder for us to maximally use all the available
ram for block cache in the regionserver.  Which you may or may not
want to do for alternative reasons.  At least with Xmx you can plan
and control your deployments, and you wont suffer from heap growth due
to heap fragmentation.


On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <todd@cloudera.com> wrote:
> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
> <gaurav.gs.sharma@gmail.com> wrote:
>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>> given them a further advantage but as you said, not much is known about the
>> test source code.
> I think Hypertable does use tcmalloc or jemalloc (forget which)
> You may be interested in this thread from back in August:
> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
> -Todd
>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>> So if that is the case, I'm not sure how that is a fair test.  One
>>> system reads from RAM, the other from disk.  The results as expected.
>>> Why not test one system with SSDs and the other without?
>>> It's really hard to get apples/oranges comparison. Even if you are
>>> doing the same workloads on 2 diverse systems, you are not testing the
>>> code quality, you are testing overall systems and other issues.
>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>> would blunt the advantage of a C++ program using malloc.
>>> -ryan
>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <tdunning@maprtech.com>
>>> wrote:
>>> > From the small comments I have heard, the RAM versus disk difference is
>>> > mostly what I have heard they were testing.
>>> >
>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ryanobjc@gmail.com>
>>> wrote:
>>> >
>>> >> We dont have the test source code, so it isnt very objective.  However
>>> >> I believe there are 2 things which help them:
>>> >> - They are able to harness larger amounts of RAM, so they are really
>>> >> just testing that vs HBase
>>> >>
>>> >
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message