hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1818) HFile code review and refinement
Date Fri, 11 Sep 2009 21:18:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754354#action_12754354

Schubert Zhang commented on HBASE-1818:

1. Your description of "return null" is good, i.e. null means this meta block no exist.
     Accept you comment.  
     Maybe we can throw another Exception here? If you think it is not necessary, I will remove
this added code.

2.  In the HFile.java, I found the block index is based on the first key in the block (not
last key).
     Not only treading HFile as a part of HBase, in fact, we want HFile can be a common file
format which can be used in other applications. And in fact, I like to support duplicate keys
in an HFile, since my application use HFile directly to store data. But  when I checked the
code, I found the risk to add duplicate keys. e.g. 
      - block 1:  firstKey=A,  lastKey=B,  indexKey=A
      - block 2:  firstKey=B, lastKey=C,  indexKey=B
When seek key=B, we go into block 2, and miss the lastKey=B in block 1.

Yes, you are right, if the index of data block is last key instead of  first key. it seems
      - block 1:  firstKey=A, lastKey=B, indexKey=B 
      - block 2:  firstKey=B, lastKey=C, indexKey=C
When seek key=B, we go into block 1. The Scanner.seekTo() will find key=B in block 1 from
the firstKey of block 1, and the Scanner.next() will not miss the key=B in block 2.

But I double checked the HFile code, the block index is really firstKey now.

> HFile code review and refinement
> --------------------------------
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>         Attachments: HFile-v3.patch
> HFile is a good mimic of Google's SSTable file format. And we want HFile to become a
common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then provide fixed
patch after the review.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message