hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1978) Change the range/block index scheme from [start,end) to (start, end], and index range/block by endKey, specially in HFile
Date Tue, 20 Mar 2012 01:29:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233094#comment-13233094

Schubert Zhang commented on HBASE-1978:

refers to HBASE-2600
> Change the range/block index scheme from [start,end) to (start, end], and index range/block
by endKey, specially in HFile
> -------------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-1978
>                 URL: https://issues.apache.org/jira/browse/HBASE-1978
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, master, regionserver
>            Reporter: Schubert Zhang
>         Attachments: HBASE-1978-HFile-v1.patch
> From the code review of HFile (HBASE-1818), we found the HFile allows duplicated key.
But the old implementation would lead to missing of duplicated key when seek and scan, when
the duplicated key span multiple blocks.
> We provide a patch (HBASE-1841 is't step1) to resolve above issue. This patch modified
HFile.Writer to avoid generating a problem hfile with above cross-block duplicated key. It
only start a new block when current appending key is different from the last appended key.
But it still has a rish when the user of HFile.Writer append many same duplicated key which
lead to a very large block and need much memory or Out-of-memory.
> The current HFile's block-index use startKey to index a block, i.e. the range/block index
scheme is [startKey,endKey).
> As refering to the section 5.1 of the Google Bigtable paper.
> "The METADATA table stores the location of a tablet under a row key that is an encoding
of the tablet's table identifer and its end row."
> The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable or HFile, so
we should use EndKey in HFile's BlockIndex. In my experiences of Hypertable, the METADATA
is also "tableID:endRow".
> We would change the index scheme in HFile, from [startKey,endKey) to (startKey,endKey].
And change the binary search method to meet this index scheme.
> This change can resolve above duplicated-key issue. 
> Note:
> The totally fix need to modify many modules in HBase, seems include HFile, META schema,
some internal code, etc.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message