hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: random read/write performance
Date Tue, 06 Oct 2009 20:07:58 GMT
Hey Adam:

Thanks for checking in.

I just did some rough loadings on a small (old hardware) cluster using less
memory per regionserver than you.  Its described on this page:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random writing
1k records with the PerformanceEvaluation script to a single regionserver, I
can do about 8-10k/writes/second on average using the 0.20.1 release
candidate 1 with a single client.  Sequential writes are about the same
speed usually.  Random reads are about 650/second on average with single
client and about 2.4k/second on average with 8 concurrent clients.

So it seems like you should be able to do better than
300ops/persecond/permachine -- especially if you can do the java api.

This single regionserver was carrying about 50 regions.  Thats about 10GB.
How many regions loaded in your case?

If throughput is important to you, lzo should help (as per J-D).   Turning
off WAL will also help with write throughput but that might not be what you
want.  Random-read-wise, the best thing you can do is give it RAM (6G should
be good).

Is that 50-200 clients per regionserver or for the overall cluster?  If per
regionserver, I can try that over here.   I can try with bigger regions if
you'd like -- 1G regions -- to see if that'd help your use case (if you
enable lzo, this should up your throughput and shrink the number of regions
any one server is hosting).

St.Ack





On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein <silberst@yahoo-inc.com>wrote:

> Hi,
>
> Just wanted to give a quick update on our HBase benchmarking efforts at
> Yahoo.  The basic use case we're looking at is:
>
> 1K records
>
> 20GB of records per node (and 6GB of memory per node, so data is not
> memory resident)
>
> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>
> Multiple clients doing the reads/writes (i.e. 50-200)
>
> Measure throughput vs. latency, and see how high we can push the
> throughput.
>
> Note that although we want to see where throughput maxes out, the
> workload is random, rather than scan-oriented.
>
>
>
> I've been tweaking our HBase installation based on advice I've
> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
> size set to 6GB per server, and have iCMS off.  I'm still using the REST
> server instead of the java client.  We're about to move our benchmarking
> tool to java, so at that point we can use the java API.  At that point,
> I want to turn off WAL as well.  If anyone has more suggestions for this
> workload (either things to try while still using REST, or things to try
> once I have a java client), please let me know.
>
>
>
> Given all that, I'm currently seeing maximal throughput of about 300
> ops/sec/server.  Has anyone with a similar disk-resident and random
> workload seen drastically different numbers, or guesses for what I can
> expect with the java client?
>
>
>
> Thanks!
>
> Adam
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message