hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9583) add document for getShortMidpointKey
Date Fri, 27 Sep 2013 00:17:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779483#comment-13779483

Jonathan Hsieh commented on HBASE-9583:

HFiles contain many blocks that contain a range of sorted Cells.  Each cell has a key.  To
save IO when reading Cells, the HFile also has an index that maps a Cell's start key to the
offset of the beginning of a particular block.  Prior to this optimization, HBase would use
the key of the first cell in each data block as the index key.  

In HBASE-7845, we generate a new key that is lexicographically larger than the last key of
the previous block and lexicographically equal or smaller than the start key of the current
block.  While actual keys can potentially be very long, this "fake key" or "virtual key" can
be much shorter.  For example, if the stop key of previous block is "the quick brown fox",
the start key of current block is "the who", we could use "the r" as our virtual key in our
hfile index. 

There are two benefits to this: 
  1) having shorter keys reduces the hfile index size, (allowing us to keep more indexes in
memory), and 
  2) using something closer to the end key of the previous block allows us to avoid a potential
extra IO when the target key lives in between the "virtual key" and the key of the first element
in the target block.

This optimization (implemented by the getShortMidpointKey method) is inspired by LevelDB's
ByteWiseComparatorImpl::FindShortestSeparator() and FindShortSuccessor().  
> add document for getShortMidpointKey
> ------------------------------------
>                 Key: HBASE-9583
>                 URL: https://issues.apache.org/jira/browse/HBASE-9583
>             Project: HBase
>          Issue Type: Task
>          Components: HFile
>    Affects Versions: 0.98.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HBase-9583.txt, HBase-9583-v2.txt
> add the faked key to documentation http://hbase.apache.org/book.html#hfilev2

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message