hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
Date Fri, 05 Feb 2016 23:28:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135250#comment-15135250
] 

stack commented on HBASE-14790:
-------------------------------

I gave it a try on a little cluster. Not all the metrics are hooked up and it looks like the
patched version did 1/10th less operations. That said, the patch looks to be 2x faster on
the simple WALPE test with default args.  it completes in half the time with half the CS,
half the branch-misses, with a better instruction per cycle count. Patch is looking promising.
Let me know when you'd like a bit of review [~Apache9]

In both cases, did this against a cluster of nine nodes:

{code}
$ perf stat ./hbase/bin/hbase --config ~/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation
{code}


Here is the end of the log from the UNPATCHED run (master branch as of now):

{code}
...
-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 1000000
               min = 513398
               max = 2166155
              mean = 721937.32
            stddev = 97572.13
            median = 717876.00
              75% <= 775738.00
              95% <= 869750.00
              98% <= 912876.00
              99% <= 950262.00
            99.9% <= 1314831.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 1000008
               min = 1
               max = 1
              mean = 1.00
            stddev = 0.00
            median = 1.00
              75% <= 1.00
              95% <= 1.00
              98% <= 1.00
              99% <= 1.00
            99.9% <= 1.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 1000008
               min = 453139
               max = 1871512
              mean = 639040.49
            stddev = 88377.41
            median = 633963.00
              75% <= 689163.00
              95% <= 764942.00
              98% <= 805957.00
              99% <= 836714.00
            99.9% <= 1871512.00


     -- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 557000000
         mean rate = 763472.57 events/second
     1-minute rate = 763802.93 events/second
     5-minute rate = 699050.48 events/second
    15-minute rate = 424725.42 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 1000008
         mean rate = 1370.69 events/second
     1-minute rate = 1371.30 events/second
     5-minute rate = 1255.04 events/second
    15-minute rate = 762.53 events/second


2016-02-05 15:08:25,746 INFO  [WALPerfEval.logRoller] regionserver.LogRoller: LogRoller exiting.

 Performance counter stats for './hbase/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':

     322385.687852 task-clock                #    0.438 CPUs utilized
         8,158,772 context-switches          #    0.025 M/sec
           575,922 cpu-migrations            #    0.002 M/sec
         3,064,079 page-faults               #    0.010 M/sec
   474,704,238,311 cycles                    #    1.472 GHz
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   204,341,980,679 instructions              #    0.43  insns per cycle
    36,933,173,991 branches                  #  114.562 M/sec
       787,891,661 branch-misses             #    2.13% of all branches

     735.817239034 seconds time elapsed

{code}

Here is the end of the log from the PATCHED run:

{code}
...

-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 901306
               min = 156502
               max = 364739
              mean = 231465.79
            stddev = 25166.33
            median = 227923.00
              75% <= 246449.00
              95% <= 274560.00
              98% <= 288228.00
              99% <= 307981.00
            99.9% <= 364739.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 0
               min = 0
               max = 0
              mean = 0.00
            stddev = 0.00
            median = 0.00
              75% <= 0.00
              95% <= 0.00
              98% <= 0.00
              99% <= 0.00
            99.9% <= 0.00

-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 502038025
         mean rate = 2283690.25 events/second
     1-minute rate = 2268881.93 events/second
     5-minute rate = 1180418.70 events/second
    15-minute rate = 486667.36 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 0
         mean rate = 0.00 events/second
     1-minute rate = 0.00 events/second
     5-minute rate = 0.00 events/second
    15-minute rate = 0.00 events/second

 Performance counter stats for './hbase/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':

     256123.078857 task-clock                #    1.030 CPUs utilized
         3,359,633 context-switches          #    0.013 M/sec
            65,083 cpu-migrations            #    0.254 K/sec
         3,144,010 page-faults               #    0.012 M/sec
   412,859,061,607 cycles                    #    1.612 GHz
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   199,715,055,614 instructions              #    0.48  insns per cycle
    36,232,946,697 branches                  #  141.467 M/sec
       461,875,115 branch-misses             #    1.27% of all branches

     248.692785230 seconds time elapsed
{code}


> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But
in fact, we do not need most of the features if we only want to log WAL. For example, we do
not need pipeline recovery since we could just close the old logger and open a new one. And
also, we do not need to write multiple blocks since we could also open a new logger if the
old file is too large.
> And the most important thing is that, it is hard to handle all the corner cases to avoid
data loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due
to its complicated logic. And the complicated logic also force us to use some magical tricks
to increase performance. For example, we need to use multiple threads to call {{hflush}} when
logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL.
For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message