hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12313) Redo the hfile index length optimization so cell-based rather than serialized KV key
Date Tue, 28 Oct 2014 18:09:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187200#comment-14187200
] 

stack commented on HBASE-12313:
-------------------------------

bq. So what happens when a scan is created with start key as 'h' and end key as 'z'. We will
start from the 0th block thinking 'h' is in that block and later fetch the next block where
it starts with 'i'. 

That is a correct. When we search the index, block '1' starts with 'i' so if a 'h' exists,
it must be in block '0'.

We do not consult the 'lastkey-in-a-block' when doing index lookup.  If we did, in the test,
we'd notice that last key in block was actually 'f' and so therefore we should really be returning
'1' instead of '0' -- but this is a TODO.

Let me just restore the midkey to the way it used to work.  My thinking was no need of a midkey
when no savings to be had in index size but you raise the interesting point that even though
no size savings, we could save a seek in the rare case where a key lookup falls between last
key of one block and first of the next AND it happens to fall after the calculated midkey
(for the case where key elements are all same size -- when not we were doing old behavior).
 Turns out the midkey calc is as though we were consulting lastkey in block (only we aren't)
only it works only 50% of the time (when key is > midkey).



> Redo the hfile index length optimization so cell-based rather than serialized KV key
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-12313
>                 URL: https://issues.apache.org/jira/browse/HBASE-12313
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: stack
>            Assignee: stack
>         Attachments: 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
12313v5.txt, 12313v6.txt, 12313v8.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte array is thorny.
> I tried to move over the first and last key serializations and the hfile index entries
to be cell but patch was turning massive.  Here is a smaller patch that just redoes the optimization
that tries to find 'short' midpoints between last key of last block and first key of next
block so it is Cell-based rather than byte array based (presuming Keys serialized in a certain
way).  Adds unit tests which we didn't have before.
> Also remove CellKey.  Not needed... at least not yet.  Its just utility for toString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message