hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10625) Remove unnecessary key compare from AbstractScannerV2.reseekTo
Date Fri, 28 Feb 2014 00:00:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915237#comment-13915237
] 

Lars Hofhansl commented on HBASE-10625:
---------------------------------------

Testing this is hard it seems. I wrote a quick tool that runs through the scenarios and calculates
mean/standard deviation.
10m rows, 5 columns (C0-C4), 8 byte row keys, 8 byte values.

Results are surprisingly disappointing:
||collumns||None||C0||C1||C4||C1,C3||C2,C3||C2,C3,C4||
|w/ patch|13.30, 0.12|14.24, 0.17|22.29, 0.09|16.42, 0.03|31.51, 0.27|24.60, 0.02|21.04, 0.05|
|w/o patch|13.72, 0.07|14.47, 0.21|23.12, 0.13|17.30,0.16|32.16, 0.05|25.00, 0.04|21.11, 0.05|

So the gains are minimal.
Will run the same at home later.

> Remove unnecessary key compare from AbstractScannerV2.reseekTo
> --------------------------------------------------------------
>
>                 Key: HBASE-10625
>                 URL: https://issues.apache.org/jira/browse/HBASE-10625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 10625-0.94.txt, 10625-trunk.txt
>
>
> In reseekTo we find this
> {code}
> ...
>         compared = compareKey(reader.getComparator(), key, offset, length);
>         if (compared < 1) {
>           // If the required key is less than or equal to current key, then
>           // don't do anything.
>           return compared;
>         } else {
>            ...
>            return loadBlockAndSeekToKey(this.block, this.nextIndexedKey,
>               false, key, offset, length, false);
> ...
> {code}
> loadBlockAndSeekToKey already does the right thing when a we pass a key that sorts before
the current key. It's less efficient than this early check, but in the vast (all?) cases we
pass forward keys (as required by the reseek contract). We're optimizing the wrong thing.
> Scanning with the ExplicitColumnTracker is 20-30% faster.
> (I tested with rows of 5 short KVs selected the 2nd and or 4th column)
> I propose simply removing that check.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message