hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Sharma <adityadsha...@gmail.com>
Subject Re: High variance in results for hbase benchmarking
Date Fri, 04 Mar 2011 06:51:29 GMT
It was quite variable, as I said earlier, but in one sort of representative
READs only benchmark, it was 115 READs per second. For a READ + WRITE
benchmark, it was 90 operations per second (with some primitive caching
thrown in).


On Fri, Mar 4, 2011 at 11:54 AM, Ted Dunning <tdunning@maprtech.com> wrote:

> What kinds of speeds are you seeing?
> On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <adityadsharma@gmail.com>wrote:
>> Hi All,
>> I am working on benchmarking different data stores to find the best fit
>> for
>> our use case. I would like to know views and suggestions of the HBase user
>> and developer community on some of my findings as the results I am getting
>> are highly variable.
>> My HBase Setup has two EC2 Large hosts (each one has 7.5 GB memory, 4 CPU
>> cores etc), on which both the HBase master and slaves reside. HDFS master
>> slave and Zookeeper instances are also split between these two hosts. I
>> have
>> three tables with one column family each and they have 100 million, 75
>> million and 500 million rows respectively. The actual data consists of a
>> String key and Long, String columns. The usual access patterns is to have
>> GETs on individual keys and have periodical batch PUTs.
>> I ran my benchmark application on HBase for different scenarios to measure
>> pure GET performance, mixed GET and PUT performance etc. This was actually
>> without enabling the HTable APIs writeBuffer or any BloomFilters. The
>> results I got were quite unimpressive, compared to similar benchmarking
>> done
>> using MySQL, Cassandra etc. The performance was anywhere from 40% to 100%
>> worse. So I started using writeBuffers in my code and also enabled
>> BloomFilters at ROW level. However I started seeing a lot of variance in
>> the
>> benchmarking results (though I would not be too sure about correlating
>> this
>> with Bloomfilters/WriteBuffering). Another fact causing concern was that
>> the
>> results were actually worse than earlier results.
>> Since we are using EC2 Large instances, it seems unlikely that network or
>> some other virtualization related resources crunch are affecting our
>> performance measurement.
>> What I would want to know is whether this rings a bell for anyone else
>> here.
>> Could I be missing out on some configuration knob which would result in
>> background compaction or some such process to start at the wrong time
>> which
>> might be affecting my benchmarks? Any comments or feedback are welcome.
>> Thanks,
>> Aditya

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message