hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9915) Severe performance bug: isSeeked() in EncodedScannerV2 is always false
Date Fri, 08 Nov 2013 01:52:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816911#comment-13816911

Lars Hofhansl commented on HBASE-9915:

Some number with Phoenix. 5m rows, 5 long columns, 8 byte rowkeys, FAST_DIFF encoding, table
fully flushed and major compacted, everything in the blockcache.
(some weirdly named columns, this was a preexisting table that I mapped into Phoenix - with

||Query||Without Patch||With Patch||
|select count\(*) from "my5"|12.8s|9.7s|
|select count\(*) from "my5" where "3" = 1|23.5s|11.8s|
|select count\(*) from "my5" where "3" > 1|34.8s|15.6s|
|select avg("3") from "my5"|35.6s|17.4s|
|select avg("0"), avg("3") from "my5"|36.5s|20.2s|
|select avg("0"), avg("3") from "my5" where "4" = 1|31.8s|15.4s|
|select avg("0"), avg("3") from "my5" where "4" > 1|46.4s|25.1s|

Note that Phoenix adds a "fake" column to each row (so each row has a known KV for things
like COUNT) and (almost) always uses the ExplicitColumnTracker.

> Severe performance bug: isSeeked() in EncodedScannerV2 is always false
> ----------------------------------------------------------------------
>                 Key: HBASE-9915
>                 URL: https://issues.apache.org/jira/browse/HBASE-9915
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.96.1, 0.94.14
>         Attachments: 9915-0.94.txt, 9915-trunk-v2.txt, 9915-trunk.txt, profile.png
> While debugging why reseek is so slow I found that it is quite broken for encoded scanners.
> The problem is this:
> AbstractScannerV2.reseekTo(...) calls isSeeked() to check whether scanner was seeked
or not. If it was it checks whether the KV we want to seek to is in the current block, if
not it always consults the index blocks again.
> isSeeked checks the blockBuffer member, which is not used by EncodedScannerV2 and thus
always returns false, which in turns causes an index lookup for each reseek.

This message was sent by Atlassian JIRA

View raw message