hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Confusing Region Split behavior in YCSB Tests
Date Fri, 12 Nov 2010 23:24:33 GMT
One thought:

When you do splits, do you also major compact after the splits before
starting the read test? Maybe it's possible that when you just have one
region it's all local data, but as you split it, it's non-local. With the
small data sizes, it all fits in cache so it doesn't matter, but with the
big data size, you're doing non-local reads and they don't fit in cache.

-Todd

On Fri, Nov 12, 2010 at 1:12 PM, Suraj Varma <svarma.ng@gmail.com> wrote:

> 1) 100% random reads. Sorry - I should have mentioned that.
> 2) Hmm ... I was using 0.1.2; I had to fix the hbase db client to work with
> hbase-0.20.6 ..., so I had taken a git pull on Oct 18th. I see that as of
> Oct 26th, the README file has been updated with the 0.1.3 update (with the
> 0.20.6 fixes as well).
> 3) Ok, I'll try out 0.90-RC from trunk since that's what we planned to use
> next. I ran it against 0.20.6 since that's what our environment has now.
> 4) Yes - heap is default, 1000m. Good question though - I went back and
> looked at our environment and I'm sorry I made a mistake in my previous
> mail. The test environment boxes (running the RS/DN) have only 4GB RAM ...
> I
> confused it with another environment. Sorry! So - because the boxes have
> only 4GB RAM, we could only give it the defaults.
>
> Thanks,
> --Suraj
>
>
> On Fri, Nov 12, 2010 at 11:52 AM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > Weird results indeed, nothing comes to mind at the moment that could
> > explain this behavior. I do have a few questions/comments:
> >
> >  - It's not clear to me from your email what kind of operation your
> > were doing. Was it 100% random reads? Or a mix of reads and writes?
> > Any scans?
> >  - Which version of YCSB were you using? The latest one, 1.3, has a
> > fix for client-side performance in HBase (ycsb was deserializing the
> > Results wrong)
> >  - Woud you be able to give the latest of version of 0.89 a try? Or
> > even maybe trunk, for which we are in the process of branching and
> > cutting out an RC? 0.20.6 is now quite old and I'm not sure how useful
> > it would be to try to optimize it.
> >  - How much heap did you give to HBase? You said default
> > configurations, so 1000m? Why not more?
> >
> > Thanks,
> >
> > J-D
> >
> > On Fri, Nov 12, 2010 at 10:52 AM, Suraj Varma <svarma.ng@gmail.com>
> wrote:
> > > Hello All:
> > > I'm using YCSB to execute tests on the effects of region splits and I'm
> > > seeing some confusing results.
> > >
> > > The aim of the experiment was to determine how much of an impact on
> > > throughput/average latency/95th-percentile latency is observed when for
> > the
> > > same load the regions are split and distributed across region servers.
> I
> > > expected to see results that show the throughput increase and 95-th
> > > percentile latency reduce (trending) as the regions are distributed
> > >
> > > For the 1K row size, this worked as expected - the throughput increased
> > and
> > > latency decreased as the regions were spread out.
> > >
> > > However, for the 100K and 500K row sizes (closer to what we plan to
> have
> > in
> > > our HBase table), the results were the opposite;
> > > The throughput & latency results were best for the 1-region case (i.e.
> > all
> > > data fits in one region) and degraded quite a bit when the region split
> > and
> > > got distributed to two region servers. As the region is split further,
> > there
> > > is improvement over the first split case in the 500K run, but not as
> much
> > in
> > > the 100K run - in either case ... the results never approach the one
> > region
> > > case.
> > >
> > > During the 1-region runs, the CPU on the region server hosting the
> table
> > is
> > > pegged at 88-90%, the load average hovers between 3.2-4.2 (1m) ... as
> the
> > > tests proceeds the load average drops off to be closer to 3, till it
> > ends.
> > > During the split region runs, the CPU on the region servers keeps
> > reducing
> > > 52% (4 region split), 30% (8 region split)... 8% (for 32 region split)
> > while
> > > the load was fairly low (around 0.2 - 0.5 on each box). These are rough
> > > numbers just to give an idea of how the RS's were behaving.
> > >
> > > *Test Environment:
> > > *Apache HBase 0.20.6 (+ HBASE-2599 patch), Apache Hadoop 0.20.2,
> > > 5 RS/DN, 4-core, 8GB boxes (using default HBase configurations 1000m
> JVM
> > > heaps for RS/DN)
> > >
> > > *YCSB Clients:
> > > *5 (4-core, 8GB) hosts, each with 5 YCSB client processes (Xmx1024m)
> and
> > > each YCSB process with threadcount=7 (so, 25 client JVMs with 7 threads
> > each
> > > hitting 5 RS)
> > >
> > > *YCSB configuration:
> > > *recordcount=<varying to fit the entire table in a single region at the
> > > start of the test> ... e.g. 400 * 500K rows, 2000 * 100K rows, 250000 *
> > 1K
> > > rows
> > > operationcount=250000
> > > threadcount=7
> > > requestdistribution=<tried first with zipfian and then with uniform -
> > same
> > > results>; the results below are 1K, 100K = zipfian and 500K = uniform
> > > (though I had tried zipfian earlier with pretty much similar trends)
> > >
> > > Rowsize => tried with 1K (10 cols of 100 bytes each), 100K (10 cols
> with
> > 1K
> > > bytes each), 500K (10 cols with 50K bytes each)
> > >
> > > *Test Method:
> > > *1) Start with a HTable fitting in 1-region;
> > > 2) Run the YCSB test;
> > > 3) Split region using web ui;
> > > 4) Run the same YCSB test;
> > > 5) Keep repeating for 1, 4, 8, 16, 32 regions. Some splits were ignored
> > if
> > > the regions didn't get distributed across RS.
> > > 6) Tried it with 1K row size, 100K row size, 500K row size and got
> > similar
> > > results.
> > >
> > > *Expected Result:
> > > *Throughput increases as regions spread across region server; Average
> > > Latency decreases as well.
> > >
> > > *Actual Result:
> > > *The 1K row size gives expected results
> > > For 100K and 500K row sizes, the single region case gives highest
> > Throughput
> > > (ops/sec) and lowest Average and 95-th percentile latency.
> > >
> > > Here's a snippet and I'm pastebin-ing a larger dataset to keep the mail
> > > short.
> > >
> > > (e.g. 100K row size - 1 row, 10 fields each with 10K size)
> > > Throughput (ops/sec); Row size = 100K
> > > --------------------------------------
> > > ycsbjvm, 1 REG, 8 REG,    32 REG
> > > ycsbjvm1, 209.2164009,    129.5659024,    129.1324537
> > > ycsbjvm2, 207.6238078,    130.4403405,    132.0112843
> > > ycsbjvm3, 215.2983856,    129.5349652,    130.1714966
> > > ycsbjvm4, 209.3266172,    128.5535964,    130.7984592
> > > ycsbjvm5, 209.1666591,    128.8050626,    132.4272952
> > >
> > > (Throughput decreases from 1 region to 8 region, increases slightly
> from
> > 8
> > > reg to 32 reg)
> > >
> > > 95-th Percentile Latency (ms); Row size = 100K
> > > ycsbjvm, 1 REG, 8 REG,    32 REG
> > > ycsbjvm1, 218,    221,    240
> > > ycsbjvm2, 219,    221,    229
> > > ycsbjvm3, 217,    221,    235
> > > ycsbjvm4, 219,    221,    236
> > > ycsbjvm5, 219,    222,    229
> > > ycsbjvm6, 219,    221,    228
> > >
> > > (95-th Percentile Latency increases from 1 region to 8 region to 32
> > region)
> > >
> > > *Paste Bin of full results from all client YCSB JVMs
> > > *1K Row Size Run Results: http://pastebin.com/X2HXn99T
> > > 100K Row Size Run Results: http://pastebin.com/wjQzux3c
> > > 500K Row Size Run Results: http://pastebin.com/6Tz1GpPP
> > >
> > > Would like your ideas / inputs into how I can explain this behavior;
> let
> > me
> > > know if you want additional details.
> > > Thanks in advance.
> > > --Suraj
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message