hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: High variance in results for hbase benchmarking
Date Fri, 04 Mar 2011 06:24:39 GMT
What kinds of speeds are you seeing?

On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <adityadsharma@gmail.com>wrote:

> Hi All,
>
> I am working on benchmarking different data stores to find the best fit for
> our use case. I would like to know views and suggestions of the HBase user
> and developer community on some of my findings as the results I am getting
> are highly variable.
>
> My HBase Setup has two EC2 Large hosts (each one has 7.5 GB memory, 4 CPU
> cores etc), on which both the HBase master and slaves reside. HDFS master
> slave and Zookeeper instances are also split between these two hosts. I
> have
> three tables with one column family each and they have 100 million, 75
> million and 500 million rows respectively. The actual data consists of a
> String key and Long, String columns. The usual access patterns is to have
> GETs on individual keys and have periodical batch PUTs.
>
> I ran my benchmark application on HBase for different scenarios to measure
> pure GET performance, mixed GET and PUT performance etc. This was actually
> without enabling the HTable APIs writeBuffer or any BloomFilters. The
> results I got were quite unimpressive, compared to similar benchmarking
> done
> using MySQL, Cassandra etc. The performance was anywhere from 40% to 100%
> worse. So I started using writeBuffers in my code and also enabled
> BloomFilters at ROW level. However I started seeing a lot of variance in
> the
> benchmarking results (though I would not be too sure about correlating this
> with Bloomfilters/WriteBuffering). Another fact causing concern was that
> the
> results were actually worse than earlier results.
>
> Since we are using EC2 Large instances, it seems unlikely that network or
> some other virtualization related resources crunch are affecting our
> performance measurement.
>
> What I would want to know is whether this rings a bell for anyone else
> here.
> Could I be missing out on some configuration knob which would result in
> background compaction or some such process to start at the wrong time which
> might be affecting my benchmarks? Any comments or feedback are welcome.
>
> Thanks,
> Aditya
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message