hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1818) HFile code review and refinement
Date Thu, 10 Sep 2009 03:07:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753403#action_12753403

Schubert Zhang commented on HBASE-1818:

1. Regards duplicate keys in a HFile. I am following concern:
    If we allow duplicate keys. Consider following scenario:
    A key="abcd" is append in block1's last key/value pair.
    And the the same key="abcd" is append in block2's first key/value pair. Then in the block
index, the key="abcd" will point to block2.
    Then, we want to scan from key="abcd", but the first  key="abcd" (in block1's last) will
be missed out.
     Can you confirm this scenario is acceptable or required?

2. + if (buf == null)
    + return null;

This check is only added in getMetaBlock(...). In this method, there are three points to return
(1)  if (trailer.metaIndexCount == 0) {
         return null; // there are no meta blocks
(2) if (block == -1)
         return null;
(3) if (buf == null)   //new added by me
        return null; 
  If we do not check it, the following buf.get(..) may NPE. because the decompress() method
will not throw exception. Do you mean NPE is better than "return null" which same as above
  In fact, it is diffcult to make above trade-off for me, maybe I am doing the way as C++.

3. Regards buf.compact().
    Yes, you may be right. After more test about performance, my patch does not improve the
performance (I don't know if it can improve in some other environments.) I agree to remove
this modification in my patch to retain the neat of the returned block buffer (position at

@stack and ryan
Thanks for your test. I will change the patch according to you comments. To include only bug
If the test fail, please just revert to old version.
Please give me comments about my above questions, then I can make active immediately. Thanks.

> HFile code review and refinement
> --------------------------------
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>         Attachments: HFile-v1.patch, HFile-v2.patch
> HFile is a good mimic of Google's SSTable file format. And we want HFile to become a
common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then provide fixed
patch after the review.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message