hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schubert Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1818) HFile code review and refinement
Date Wed, 16 Sep 2009 02:31:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755844#action_12755844

Schubert Zhang commented on HBASE-1818:

Thanks stack for create a new issue (hbase-1841)

Regards the  2 potential fixes:
    - fix binary search algorithm to actually find the lower bound in face of duplicates.
      I think maybe we need to change to use lastkey as the block index?

    - prevent hfiles like the one indicated above from being created, in this case by extending
block 1 larger than the default sizing until we get a different key.
      In fact, we used this way in one of our old product, i.e. only start new block/index
at the boundary of different key. In this case, we should ensure the number of the duplicated
keys not too large (that will lead big block).

> HFile code review and refinement
> --------------------------------
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>         Attachments: HFile-v3.patch, HFile-v4.patch, HFile-v5.patch
> HFile is a good mimic of Google's SSTable file format. And we want HFile to become a
common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then provide fixed
patch after the review.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message