hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liang xie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7495) parallel scanner seek in StoreScanner's constructor
Date Thu, 17 Jan 2013 07:34:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555951#comment-13555951
] 

liang xie commented on HBASE-7495:
----------------------------------

I just did a apple-to-apple comparison this morning, it shows the parallel seek reduces latency
in special scenario.
Attached is a prelim patch just for refer.

My test env : 10 dn/rs each with 12*2T SATA, "hfile.block.cache.size=0", hbase0.94.3, cdh4.1.1
My test data :  
recordcount=1000000000
fieldcount=3
fieldlength=200

hbase(main):002:0> describe 'YCSBTest'
DESCRIPTION ENABLED
{NAME => 'YCSBTest', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy',
FAMILIES => [{NAME => 'te true
st', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '1',
VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER
SIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'tru
e', BLOCKCACHE => 'true'}]}


$./hdfs dfs -du -s -h hdfs://lgxl-xieliang/
726.8g hdfs://lgxl-xieliang/

100 regions in total,  and most of numberOfStorefiles in those regions are [0,5]

My test cmd:  bin/ycsb run hbase -P ./workloads/kaka -threads 1 -p columnfamily=test -p table=YCSBTest
-s > log/run.log 2>&1 &

I restarted the whole hbase/hdfs cluster and clear OS cache(echo 1 > /proc/sys/vm/drop_caches)
before each run.

Serial seek result:

[OVERALL], RunTime(ms), 300027.0
[OVERALL], Throughput(ops/sec), 20.09819116279535
[READ], Operations, 6030
[READ], AverageLatency(us), 49739.97446102819
[READ], MinLatency(us), 2768
[READ], MaxLatency(us), 782892
[READ], 50thPercentileLatency(ms), 45
[READ], 95thPercentileLatency(ms), 90
[READ], 99thPercentileLatency(ms), 124
[READ], Return=0, 6030 

Parallel seek result:

[OVERALL], RunTime(ms), 300016.0
[OVERALL], Throughput(ops/sec), 39.584555490373845
[READ], Operations, 11876
[READ], AverageLatency(us), 25249.878410239136
[READ], MinLatency(us), 3084
[READ], MaxLatency(us), 753547
[READ], 50thPercentileLatency(ms), 22
[READ], 95thPercentileLatency(ms), 43
[READ], 99thPercentileLatency(ms), 67
[READ], Return=0, 11876

                
> parallel scanner seek in StoreScanner's constructor
> ---------------------------------------------------
>
>                 Key: HBASE-7495
>                 URL: https://issues.apache.org/jira/browse/HBASE-7495
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>    Affects Versions: 0.94.3, 0.96.0
>            Reporter: liang xie
>            Assignee: liang xie
>         Attachments: HBASE-7495.txt
>
>
> seems there's a potential improvable space before doing scanner.next:
> {code:title=StoreScanner.java|borderStyle=solid}
>     if (explicitColumnQuery && lazySeekEnabledGlobally) {
>       for (KeyValueScanner scanner : scanners) {
>         scanner.requestSeek(matcher.getStartKey(), false, true);
>       }
>     } else {
>       for (KeyValueScanner scanner : scanners) {
>         scanner.seek(matcher.getStartKey());
>       }
>     }
> {code} 
> we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization,
to reduce latency for special case.
> Any ideas on it ?  I'll have a try if the comments/suggestions are positive:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message