hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <Chad.Walt...@microsoft.com>
Subject RE: Hypertable claiming upto >900% random-read throughput vs HBase
Date Thu, 16 Dec 2010 04:53:06 GMT
I was really just trying to address this point that Ryan made:
"- They are able to harness larger amounts of RAM, so they are really just testing that vs
HBase"

In cases where that actually makes a difference (i.e. there are significant amounts of RAM
that can't be harnessed), the overhead of additional JVMs may become inconsequential.

Obviously, your particular mileage may vary.

Chad

-----Original Message-----
From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: Wednesday, December 15, 2010 1:53 PM
To: dev@hbase.apache.org
Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase

That isn't really the trade-off.  The 10x is on an undocumented benchmark with apples to oranges
tuning.  Moreover, hbase has had massive speedups since then.

Being able to set heap size actually lets me control memory use more precisely and running
a single JVM lets me amortize JVM cost.  Java does do some sharing, but a single JVM is better.

On Wed, Dec 15, 2010 at 12:05 PM, Chad Walters
<Chad.Walters@microsoft.com>wrote:

> Sure, but if the tradeoff is being unable to use all the memory 
> effectively and suffering 10x unfavorable benchmark comparisons, then 
> running 2 or more JVMs with a regionserver per VM seems like a 
> reasonable stopgap until the GC works better.
>
> Chad
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Wednesday, December 15, 2010 11:58 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs 
> HBase
>
> Why do that?  You reduce the cache effectiveness and up the logistical 
> complexity.  As a stopgap maybe, but not as a long term strategy.
>
> Sun just needs to fix their GC.  Er, Oracle.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters 
> <Chad.Walters@microsoft.com>
> wrote:
> > Why not run multiple JVMs per machine?
> >
> > Chad
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > Sent: Wednesday, December 15, 2010 11:52 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Hypertable claiming upto >900% random-read throughput 
> > vs HBase
> >
> > The malloc thing was pointing out that we have to contend with Xmx 
> > and
> GC.  So it makes it harder for us to maximally use all the available 
> ram for block cache in the regionserver.  Which you may or may not 
> want to do for alternative reasons.  At least with Xmx you can plan 
> and control your deployments, and you wont suffer from heap growth due to heap fragmentation.
> >
> > -ryan
> >
> > On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <todd@cloudera.com> wrote:
> >> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma 
> >> <gaurav.gs.sharma@gmail.com> wrote:
> >>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it 
> >>> would have given them a further advantage but as you said, not 
> >>> much is known about the test source code.
> >>
> >> I think Hypertable does use tcmalloc or jemalloc (forget which)
> >>
> >> You may be interested in this thread from back in August:
> >> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+o
> >> n+
> >> H
> >> Base+Hypertable+comparison
> >>
> >> -Todd
> >>
> >>>
> >>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >>>
> >>>> So if that is the case, I'm not sure how that is a fair test.  
> >>>> One system reads from RAM, the other from disk.  The results as expected.
> >>>>
> >>>> Why not test one system with SSDs and the other without?
> >>>>
> >>>> It's really hard to get apples/oranges comparison. Even if you 
> >>>> are doing the same workloads on 2 diverse systems, you are not 
> >>>> testing the code quality, you are testing overall systems and other
issues.
> >>>>
> >>>> As G1 GC improves, I expect our ability to use larger and larger 
> >>>> heaps would blunt the advantage of a C++ program using malloc.
> >>>>
> >>>> -ryan
> >>>>
> >>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning 
> >>>> <tdunning@maprtech.com>
> >>>> wrote:
> >>>> > From the small comments I have heard, the RAM versus disk 
> >>>> > difference is mostly what I have heard they were testing.
> >>>> >
> >>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson 
> >>>> > <ryanobjc@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> We dont have the test source code, so it isnt very objective.
> >>>> >> However I believe there are 2 things which help them:
> >>>> >> - They are able to harness larger amounts of RAM, so they are

> >>>> >> really just testing that vs HBase
> >>>> >>
> >>>> >
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
>
>

Mime
View raw message