hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
Date Thu, 09 Jan 2014 04:56:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866315#comment-13866315
] 

stack edited comment on HBASE-10156 at 1/9/14 4:54 AM:
-------------------------------------------------------

I did comparison of trunk to this patch using [~fenghh]'s little reciepe on a little cluster
of four datanodes with the HLogPE running on separate namenode machine.  Here is the command:

{code}
$ for i in 1 3 5 10 25 50 100 200; do for j in 1; do perf stat ./bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads $i   -iterations
1000000 -keySize 50 -valueSize 100  &> "/tmp/nopatch2${i}.${j}.txt"; done; done
{code}

The above runs too long.  Next time will do 1/10th the iterations but here are the numbers
for the above:

||Clients||nopatch||withpatch||% diff||
|1|1229|1245|-1|
|3|1360|1350||0|
|5|1094|1057|-2|
|10|1384|1382|0|
|25|1521|1481|-3|
|50|1892|1484|-22|
|100|3009|1555|-48|
|200|5442|1851|-66|

One to five or ten client compare is probably of no value.  The sync time predominates.  At
50 clients/handlers, this patch shows a definite improvement.

I've been measuring a bit over the last week or so and elapsed time is pretty erratic as a
measure given it is +/-10% or more between runs.

Here is another means of compare.  Lets take the 50 clients/handlers case.  Here is what perf
stats says about each run.

First no patch:
{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 50 -iterations 1000000 -keySize 50 -valueSize 100':

    4124147.969029 task-clock                #    2.179 CPUs utilized
       190,346,772 context-switches          #    0.046 M/sec
         9,156,832 CPU-migrations            #    0.002 M/sec
            48,592 page-faults               #    0.012 K/sec
 5,960,399,494,658 cycles                    #    1.445 GHz                     [83.29%]
 4,436,005,557,091 stalled-cycles-frontend   #   74.42% frontend cycles idle    [83.31%]
 2,724,792,625,322 stalled-cycles-backend    #   45.71% backend  cycles idle    [66.74%]
 2,867,985,594,350 instructions              #    0.48  insns per cycle
                                             #    1.55  stalled cycles per insn [83.41%]
   524,067,855,577 branches                  #  127.073 M/sec                   [83.32%]
    17,507,214,729 branch-misses             #    3.34% of all branches         [83.35%]

    1892.292373360 seconds time elapsed
{code}

And with patch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 50 -iterations 1000000 -keySize 50 -valueSize 100':

    3527447.057674 task-clock                #    2.375 CPUs utilized
       130,085,238 context-switches          #    0.037 M/sec
        20,806,185 CPU-migrations            #    0.006 M/sec
            52,903 page-faults               #    0.015 K/sec
 5,230,740,387,202 cycles                    #    1.483 GHz                     [83.31%]
 3,998,305,194,614 stalled-cycles-frontend   #   76.44% frontend cycles idle    [83.34%]
 2,469,198,968,381 stalled-cycles-backend    #   47.21% backend  cycles idle    [66.69%]
 2,356,538,810,101 instructions              #    0.45  insns per cycle
                                             #    1.70  stalled cycles per insn [83.35%]
   417,045,222,142 branches                  #  118.229 M/sec                   [83.33%]
    15,652,040,814 branch-misses             #    3.75% of all branches         [83.33%]

    1484.963721267 seconds time elapsed
{code}

We do less cycles when the patch is in place.

It is more apparent for the 200 thread case.  First nopatch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 200 -iterations 1000000 -keySize 50 -valueSize 100':

   13926656.466201 task-clock                #    2.559 CPUs utilized
       589,974,965 context-switches          #    0.042 M/sec
        38,417,368 CPU-migrations            #    0.003 M/sec
            88,956 page-faults               #    0.006 K/sec
20,510,293,875,760 cycles                    #    1.473 GHz                     [83.30%]
15,492,686,319,443 stalled-cycles-frontend   #   75.54% frontend cycles idle    [83.30%]
 9,339,162,742,305 stalled-cycles-backend    #   45.53% backend  cycles idle    [66.72%]
 9,529,936,978,732 instructions              #    0.46  insns per cycle
                                             #    1.63  stalled cycles per insn [83.38%]
 1,712,735,175,559 branches                  #  122.983 M/sec                   [83.33%]
    61,211,156,300 branch-misses             #    3.57% of all branches         [83.35%]

    5442.635880066 seconds time elapsed
{code}

... and now with the patch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 200 -iterations 1000000 -keySize 50 -valueSize 100':

    8939605.570123 task-clock                #    4.830 CPUs utilized
       310,224,763 context-switches          #    0.035 M/sec
        97,373,084 CPU-migrations            #    0.011 M/sec
            69,187 page-faults               #    0.008 K/sec
13,396,936,480,713 cycles                    #    1.499 GHz                     [83.30%]
10,246,979,648,214 stalled-cycles-frontend   #   76.49% frontend cycles idle    [83.32%]
 5,721,780,980,395 stalled-cycles-backend    #   42.71% backend  cycles idle    [66.72%]
 6,385,137,954,357 instructions              #    0.48  insns per cycle
                                             #    1.60  stalled cycles per insn [83.38%]
 1,115,071,041,675 branches                  #  124.734 M/sec                   [83.33%]
    36,286,432,631 branch-misses             #    3.25% of all branches         [83.33%]

    1851.022418695 seconds time elapsed
{code}

Less cycles, less context switches.

There is a bunch of uglyness in the above -- all those cpu migrations and how high the stalls
per instruction are, and the low throughput -- but saving something for later.

It seems like this patch has ok latency relative.  HLogPE has a few metrics.  Here are some
from the run of 200 cilents toward the end.  First no patch:

{code}
2014-01-08 15:40:12,715 DEBUG [LruStats #0] hfile.LruBlockCache: Total=6.52 MB, free=6.20
GB, max=6.21 GB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0,evictions=0, evicted=0, evictedPerRun=NaN
2014-01-08 15:40:13,576 INFO  [t29] wal.HLogPerformanceEvaluation: t29 took 5400.756s 185.159ops/s
1/8/14 3:40:13 PM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
  appendMeter:
             count = 58863065272
         mean rate = 10894.75 bytes/ms
     1-minute rate = 11129.26 bytes/ms
     5-minute rate = 10953.71 bytes/ms
    15-minute rate = 10883.10 bytes/ms

  syncMeter:
             count = 30894418
         mean rate = 5.72 syncs/ms
     1-minute rate = 5.68 syncs/ms
     5-minute rate = 5.71 syncs/ms
    15-minute rate = 5.71 syncs/ms


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
  syncHistogram:
               min = -71842.00
               max = 191525149.00
              mean = 174964.59
            stddev = 238420.37
            median = 138823.50
              75% <= 235886.00
              95% <= 413183.25
              98% <= 502136.48
              99% <= 596775.15
            99.9% <= 776253.14
{code}

and with patch:

{code}
1/8/14 1:10:34 AM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
  appendMeter:
             count = 34714495550
         mean rate = 18940.19 bytes/ms
     1-minute rate = 19010.56 bytes/ms
     5-minute rate = 18957.45 bytes/ms
    15-minute rate = 17278.92 bytes/ms

  syncMeter:
             count = 7473900
         mean rate = 4.08 syncs/ms
     1-minute rate = 4.09 syncs/ms
     5-minute rate = 4.08 syncs/ms
    15-minute rate = 3.71 syncs/ms


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.countPerSync:
  syncCountHistogram:
               min = 0.00
               max = 200.00
              mean = 26.54
            stddev = 9.90
            median = 26.00
              75% <= 32.00
              95% <= 44.00
              98% <= 48.00
              99% <= 56.00
            99.9% <= 99.25


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos:
  latencyHistogram:
               min = 581216.00
               max = 715683611.00
              mean = 1843910.40
            stddev = 1045928.90
            median = 1764587.50
              75% <= 2041533.75
              95% <= 2477104.15
              98% <= 2919530.58
              99% <= 5164260.06
            99.9% <= 11252801.10


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
  syncHistogram:
               min = 35265.00
               max = 715541537.00
              mean = 1217613.73
            stddev = 806193.57
            median = 1159088.00
              75% <= 1270570.00
              95% <= 1503642.10
              98% <= 1749445.80
              99% <= 3164119.49
            99.9% <= 6123406.00
{code}

Almost double the throughput but latency is not too bad -- about 2ms per write with most of
it sync time.

Let me work on polishing the patch.



was (Author: stack):
I did comparison of trunk to this patch using [~fenghh]'s little reciepe on a little cluster
of four datanodes with the HLogPE running on separate namenode machine.  Here is the command:

$ for i in 1 3 5 10 25 50 100 200; do for j in 1; do perf stat ./bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads $i   -iterations
1000000 -keySize 50 -valueSize 100  &> "/tmp/nopatch2${i}.${j}.txt"; done; done

The above runs too long.  Next time will do 1/10th the iterations but here are the numbers
for the above:

||Clients||nopatch||withpatch||% diff||
|1|1229|1245|-1|
|3|1360|1350||0|
|5|1094|1057|-2|
|10|1384|1382|0|
|25|1521|1481|-3|
|50|1892|1484|-22|
|100|3009|1555|-48|
|200|5442|1851|-66|

One to five or ten client compare is probably of no value.  The sync time predominates.  At
50 clients/handlers, this patch shows a definite improvement.

I've been measuring a bit over the last week or so and elapsed time is pretty erratic as a
measure given it is +/-10% or more between runs.

Here is another means of compare.  Lets take the 50 clients/handlers case.  Here is what perf
stats says about each run.

First no patch:
{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 50 -iterations 1000000 -keySize 50 -valueSize 100':

    4124147.969029 task-clock                #    2.179 CPUs utilized
       190,346,772 context-switches          #    0.046 M/sec
         9,156,832 CPU-migrations            #    0.002 M/sec
            48,592 page-faults               #    0.012 K/sec
 5,960,399,494,658 cycles                    #    1.445 GHz                     [83.29%]
 4,436,005,557,091 stalled-cycles-frontend   #   74.42% frontend cycles idle    [83.31%]
 2,724,792,625,322 stalled-cycles-backend    #   45.71% backend  cycles idle    [66.74%]
 2,867,985,594,350 instructions              #    0.48  insns per cycle
                                             #    1.55  stalled cycles per insn [83.41%]
   524,067,855,577 branches                  #  127.073 M/sec                   [83.32%]
    17,507,214,729 branch-misses             #    3.34% of all branches         [83.35%]

    1892.292373360 seconds time elapsed
{code}

And with patch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 50 -iterations 1000000 -keySize 50 -valueSize 100':

    3527447.057674 task-clock                #    2.375 CPUs utilized
       130,085,238 context-switches          #    0.037 M/sec
        20,806,185 CPU-migrations            #    0.006 M/sec
            52,903 page-faults               #    0.015 K/sec
 5,230,740,387,202 cycles                    #    1.483 GHz                     [83.31%]
 3,998,305,194,614 stalled-cycles-frontend   #   76.44% frontend cycles idle    [83.34%]
 2,469,198,968,381 stalled-cycles-backend    #   47.21% backend  cycles idle    [66.69%]
 2,356,538,810,101 instructions              #    0.45  insns per cycle
                                             #    1.70  stalled cycles per insn [83.35%]
   417,045,222,142 branches                  #  118.229 M/sec                   [83.33%]
    15,652,040,814 branch-misses             #    3.75% of all branches         [83.33%]

    1484.963721267 seconds time elapsed
{code}

We do less cycles when the patch is in place.

It is more apparent for the 200 thread case.  First nopatch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 200 -iterations 1000000 -keySize 50 -valueSize 100':

   13926656.466201 task-clock                #    2.559 CPUs utilized
       589,974,965 context-switches          #    0.042 M/sec
        38,417,368 CPU-migrations            #    0.003 M/sec
            88,956 page-faults               #    0.006 K/sec
20,510,293,875,760 cycles                    #    1.473 GHz                     [83.30%]
15,492,686,319,443 stalled-cycles-frontend   #   75.54% frontend cycles idle    [83.30%]
 9,339,162,742,305 stalled-cycles-backend    #   45.53% backend  cycles idle    [66.72%]
 9,529,936,978,732 instructions              #    0.46  insns per cycle
                                             #    1.63  stalled cycles per insn [83.38%]
 1,712,735,175,559 branches                  #  122.983 M/sec                   [83.33%]
    61,211,156,300 branch-misses             #    3.57% of all branches         [83.35%]

    5442.635880066 seconds time elapsed
{code}

... and now with the patch:

{code}
 Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation
-threads 200 -iterations 1000000 -keySize 50 -valueSize 100':

    8939605.570123 task-clock                #    4.830 CPUs utilized
       310,224,763 context-switches          #    0.035 M/sec
        97,373,084 CPU-migrations            #    0.011 M/sec
            69,187 page-faults               #    0.008 K/sec
13,396,936,480,713 cycles                    #    1.499 GHz                     [83.30%]
10,246,979,648,214 stalled-cycles-frontend   #   76.49% frontend cycles idle    [83.32%]
 5,721,780,980,395 stalled-cycles-backend    #   42.71% backend  cycles idle    [66.72%]
 6,385,137,954,357 instructions              #    0.48  insns per cycle
                                             #    1.60  stalled cycles per insn [83.38%]
 1,115,071,041,675 branches                  #  124.734 M/sec                   [83.33%]
    36,286,432,631 branch-misses             #    3.25% of all branches         [83.33%]

    1851.022418695 seconds time elapsed
{code}

Less cycles, less context switches.

There is a bunch of uglyness in the above -- all those cpu migrations and how high the stalls
per instruction are, and the low throughput -- but saving something for later.

It seems like this patch has ok latency relative.  HLogPE has a few metrics.  Here are some
from the run of 200 cilents toward the end.  First no patch:

{code}
2014-01-08 15:40:12,715 DEBUG [LruStats #0] hfile.LruBlockCache: Total=6.52 MB, free=6.20
GB, max=6.21 GB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0,evictions=0, evicted=0, evictedPerRun=NaN
2014-01-08 15:40:13,576 INFO  [t29] wal.HLogPerformanceEvaluation: t29 took 5400.756s 185.159ops/s
1/8/14 3:40:13 PM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
  appendMeter:
             count = 58863065272
         mean rate = 10894.75 bytes/ms
     1-minute rate = 11129.26 bytes/ms
     5-minute rate = 10953.71 bytes/ms
    15-minute rate = 10883.10 bytes/ms

  syncMeter:
             count = 30894418
         mean rate = 5.72 syncs/ms
     1-minute rate = 5.68 syncs/ms
     5-minute rate = 5.71 syncs/ms
    15-minute rate = 5.71 syncs/ms


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
  syncHistogram:
               min = -71842.00
               max = 191525149.00
              mean = 174964.59
            stddev = 238420.37
            median = 138823.50
              75% <= 235886.00
              95% <= 413183.25
              98% <= 502136.48
              99% <= 596775.15
            99.9% <= 776253.14
{code}

and with patch:

{code}
1/8/14 1:10:34 AM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
  appendMeter:
             count = 34714495550
         mean rate = 18940.19 bytes/ms
     1-minute rate = 19010.56 bytes/ms
     5-minute rate = 18957.45 bytes/ms
    15-minute rate = 17278.92 bytes/ms

  syncMeter:
             count = 7473900
         mean rate = 4.08 syncs/ms
     1-minute rate = 4.09 syncs/ms
     5-minute rate = 4.08 syncs/ms
    15-minute rate = 3.71 syncs/ms


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.countPerSync:
  syncCountHistogram:
               min = 0.00
               max = 200.00
              mean = 26.54
            stddev = 9.90
            median = 26.00
              75% <= 32.00
              95% <= 44.00
              98% <= 48.00
              99% <= 56.00
            99.9% <= 99.25


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos:
  latencyHistogram:
               min = 581216.00
               max = 715683611.00
              mean = 1843910.40
            stddev = 1045928.90
            median = 1764587.50
              75% <= 2041533.75
              95% <= 2477104.15
              98% <= 2919530.58
              99% <= 5164260.06
            99.9% <= 11252801.10


org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
  syncHistogram:
               min = 35265.00
               max = 715541537.00
              mean = 1217613.73
            stddev = 806193.57
            median = 1159088.00
              75% <= 1270570.00
              95% <= 1503642.10
              98% <= 1749445.80
              99% <= 3164119.49
            99.9% <= 6123406.00
{code}

Almost double the throughput but latency is not too bad -- about 2ms per write with most of
it sync time.

Let me work on polishing the patch.


> Fix up the HBASE-8755 slowdown when low contention
> --------------------------------------------------
>
>                 Key: HBASE-10156
>                 URL: https://issues.apache.org/jira/browse/HBASE-10156
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: stack
>            Assignee: stack
>         Attachments: 10156.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt, Disrupting.java
>
>
> HBASE-8755 slows our writes when only a few clients.  Fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message