hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9000) Linear reseek in Memstore
Date Wed, 30 Oct 2013 05:30:27 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chao Shi updated HBASE-9000:

    Attachment: hbase-9000-port-fb.patch

The attached patch is a port of linear seek code from 0.89-fb branch (with minor changes).
I'm not sure if 20 should be a good default value for the max number of linear seeks.

Benchmark result:
||operation||trunk||w/ patch||
|reseek to next row|5.92 us|6.71 us|
|reseek to next column|3.735 us|0.569 us|

rows: 100000
columns per row: 10
versions: 3
size of row-key: 8
size of qualifier: 8
size of value: 8

bq. In all fairness, we should not divide the runtime by the number of ops. The whole point
of seeking is to reduce the number of ops
In fact, the cost of next is listed here only for reference (e.g. tune the limit of linear
seeks) and should not be compared to costs of reseeks. In our use case that scan a single
row with very large offset and small limit, the cost of a single reseek is more meaningful,
as we can directly multiple it by offset. I can understand that in some other cases, the total
time may be more important.

In any cases, the goal of the benchmark program is to evaluate the performance gain with linear
search, where we can compare these numbers w/ and w/o patch. The percentage of improvement
does not change.

I like the [~lhofhansl]'s idea of passing a hint from ScanQueryMatcher, which should also
benefit StoreFileScanner. I think we can also save some statistic information at the time
a HFile is written, such as the average #versions or #columns, which can help us to determine
if a "reseek to next row" is really far enough for a reseek.

> Linear reseek in Memstore
> -------------------------
>                 Key: HBASE-9000
>                 URL: https://issues.apache.org/jira/browse/HBASE-9000
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89-fb
>            Reporter: Shane Hogan
>            Priority: Minor
>             Fix For: 0.89-fb
>         Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch
> This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over
the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to
do this linear seek up to a configurable maximum amount of times then if the seek is not yet
complete fall back to logarithmic seek.

This message was sent by Atlassian JIRA

View raw message