hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9769) Improve Scanner with explicit column list performance when rows are small/medium size
Date Tue, 15 Oct 2013 18:28:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795473#comment-13795473
] 

Vladimir Rodionov commented on HBASE-9769:
------------------------------------------

LarsH:

{code}
Interesting. Thanks for doing the testing/profiling Vladimir!

Generally reseeks are better if they can skip many KVs.

For example if you have many versions of the same row/col, INCLUDE_NEXT_COL will be better
than issuing many INCLUDEs, same with INCLUDE_NEXT_ROW if there are many columns.

Since the number of columns/versions is not known at scan time (and can in fact vary between
rows) it is hard to always do the right thing. It also depends on how large the KVs are average.
So replacing INCLUDE_NEXT_XXX with INCLUDE is not always the right idea.

Thinking aloud... We could take the VERSIONS setting of the column family into account as
a guideline for the expected number of versions (but there's no guarantee about how many version
we'll actually have until we had a compaction), and replace INCLUDE_NEXT_COL with INCLUDE
if VERSIONS is small (maybe < 10 or so). Maybe that'd be worth a jira...


There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in some cases), and
HBASE-9732 might help in 0.94.13 (avoid memory fences on an volatile on each seek/reseek).

It also would be nice to figure out why reseek is so much more expensive. If the KV we reseek
to is on the same block it should just scan forward, otherwise it'll look in the appropriate
block. It probably is the creation of the fake KV we want to seek to (like firstOnRow, lastOnRow,
etc), which case there's not much we can.

Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet, looks like I should
start doing that.

{code}

> Improve Scanner with explicit column list performance when rows are small/medium size
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-9769
>                 URL: https://issues.apache.org/jira/browse/HBASE-9769
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.98.0, 0.94.12, 0.96.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message