hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
Date Wed, 11 Nov 2009 17:35:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776530#action_12776530

Schubert Zhang commented on HBASE-1841:

I am considering following two steps to fix this issue.

Step-1: Light fix

Modify the HFile.Writer to prevent generating a problem HFile which include duplicated keys
which straddle block boundary.
This light fix can avoid read missing issue when there are just a small quantity of duplication.
This should be a temporary fix.
I will provide this patch soon.

Step-2: Complete fix

(1) Modify the block index to point to the last key. 
(2) Modify the binary search to return the first item when duplicating.

In fact, we can refer to the section 5.1 of the Google Bigtable paper.

"The METADATA table stores the location of a tablet under a row key that is an encoding of
the tablet's table identifer and its end row."

The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable, so we should use
EndKey in HFile's BlockIndex.

In my experiences of Hypertable (I had detailedly researched the METADATA structure of Hypertable
in year 2008), the METADATA is also "tableID:endRow". 

This fix shall be complete and have many code changes, I will try to provide patch if I have

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on
a lookup
> -----------------------------------------------------------------------------------------------
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review
of hfile.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message