hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Confusing Region Split behavior in YCSB Tests
Date Fri, 12 Nov 2010 19:52:39 GMT
Weird results indeed, nothing comes to mind at the moment that could
explain this behavior. I do have a few questions/comments:

 - It's not clear to me from your email what kind of operation your
were doing. Was it 100% random reads? Or a mix of reads and writes?
Any scans?
 - Which version of YCSB were you using? The latest one, 1.3, has a
fix for client-side performance in HBase (ycsb was deserializing the
Results wrong)
 - Woud you be able to give the latest of version of 0.89 a try? Or
even maybe trunk, for which we are in the process of branching and
cutting out an RC? 0.20.6 is now quite old and I'm not sure how useful
it would be to try to optimize it.
 - How much heap did you give to HBase? You said default
configurations, so 1000m? Why not more?

Thanks,

J-D

On Fri, Nov 12, 2010 at 10:52 AM, Suraj Varma <svarma.ng@gmail.com> wrote:
> Hello All:
> I'm using YCSB to execute tests on the effects of region splits and I'm
> seeing some confusing results.
>
> The aim of the experiment was to determine how much of an impact on
> throughput/average latency/95th-percentile latency is observed when for the
> same load the regions are split and distributed across region servers. I
> expected to see results that show the throughput increase and 95-th
> percentile latency reduce (trending) as the regions are distributed
>
> For the 1K row size, this worked as expected - the throughput increased and
> latency decreased as the regions were spread out.
>
> However, for the 100K and 500K row sizes (closer to what we plan to have in
> our HBase table), the results were the opposite;
> The throughput & latency results were best for the 1-region case (i.e. all
> data fits in one region) and degraded quite a bit when the region split and
> got distributed to two region servers. As the region is split further, there
> is improvement over the first split case in the 500K run, but not as much in
> the 100K run - in either case ... the results never approach the one region
> case.
>
> During the 1-region runs, the CPU on the region server hosting the table is
> pegged at 88-90%, the load average hovers between 3.2-4.2 (1m) ... as the
> tests proceeds the load average drops off to be closer to 3, till it ends.
> During the split region runs, the CPU on the region servers keeps reducing
> 52% (4 region split), 30% (8 region split)... 8% (for 32 region split) while
> the load was fairly low (around 0.2 - 0.5 on each box). These are rough
> numbers just to give an idea of how the RS's were behaving.
>
> *Test Environment:
> *Apache HBase 0.20.6 (+ HBASE-2599 patch), Apache Hadoop 0.20.2,
> 5 RS/DN, 4-core, 8GB boxes (using default HBase configurations 1000m JVM
> heaps for RS/DN)
>
> *YCSB Clients:
> *5 (4-core, 8GB) hosts, each with 5 YCSB client processes (Xmx1024m) and
> each YCSB process with threadcount=7 (so, 25 client JVMs with 7 threads each
> hitting 5 RS)
>
> *YCSB configuration:
> *recordcount=<varying to fit the entire table in a single region at the
> start of the test> ... e.g. 400 * 500K rows, 2000 * 100K rows, 250000 * 1K
> rows
> operationcount=250000
> threadcount=7
> requestdistribution=<tried first with zipfian and then with uniform - same
> results>; the results below are 1K, 100K = zipfian and 500K = uniform
> (though I had tried zipfian earlier with pretty much similar trends)
>
> Rowsize => tried with 1K (10 cols of 100 bytes each), 100K (10 cols with 1K
> bytes each), 500K (10 cols with 50K bytes each)
>
> *Test Method:
> *1) Start with a HTable fitting in 1-region;
> 2) Run the YCSB test;
> 3) Split region using web ui;
> 4) Run the same YCSB test;
> 5) Keep repeating for 1, 4, 8, 16, 32 regions. Some splits were ignored if
> the regions didn't get distributed across RS.
> 6) Tried it with 1K row size, 100K row size, 500K row size and got similar
> results.
>
> *Expected Result:
> *Throughput increases as regions spread across region server; Average
> Latency decreases as well.
>
> *Actual Result:
> *The 1K row size gives expected results
> For 100K and 500K row sizes, the single region case gives highest Throughput
> (ops/sec) and lowest Average and 95-th percentile latency.
>
> Here's a snippet and I'm pastebin-ing a larger dataset to keep the mail
> short.
>
> (e.g. 100K row size - 1 row, 10 fields each with 10K size)
> Throughput (ops/sec); Row size = 100K
> --------------------------------------
> ycsbjvm, 1 REG, 8 REG,    32 REG
> ycsbjvm1, 209.2164009,    129.5659024,    129.1324537
> ycsbjvm2, 207.6238078,    130.4403405,    132.0112843
> ycsbjvm3, 215.2983856,    129.5349652,    130.1714966
> ycsbjvm4, 209.3266172,    128.5535964,    130.7984592
> ycsbjvm5, 209.1666591,    128.8050626,    132.4272952
>
> (Throughput decreases from 1 region to 8 region, increases slightly from 8
> reg to 32 reg)
>
> 95-th Percentile Latency (ms); Row size = 100K
> ycsbjvm, 1 REG, 8 REG,    32 REG
> ycsbjvm1, 218,    221,    240
> ycsbjvm2, 219,    221,    229
> ycsbjvm3, 217,    221,    235
> ycsbjvm4, 219,    221,    236
> ycsbjvm5, 219,    222,    229
> ycsbjvm6, 219,    221,    228
>
> (95-th Percentile Latency increases from 1 region to 8 region to 32 region)
>
> *Paste Bin of full results from all client YCSB JVMs
> *1K Row Size Run Results: http://pastebin.com/X2HXn99T
> 100K Row Size Run Results: http://pastebin.com/wjQzux3c
> 500K Row Size Run Results: http://pastebin.com/6Tz1GpPP
>
> Would like your ideas / inputs into how I can explain this behavior; let me
> know if you want additional details.
> Thanks in advance.
> --Suraj
>

Mime
View raw message