hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1818) HFile code review and refinement
Date Tue, 15 Sep 2009 21:23:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755701#action_12755701

ryan rawson commented on HBASE-1818:

in reference to the block index, yes the scenario with duplicate keys that span a block boundary

It's possible that we could fix these holes with a different write strategy which doesnt create
invalid hfiles like the one you theorized above. Another scenario is when you could have duplicate
key entries in the index, which could cause problems with the binary search algorithm.

There is 2 potential fixes here:
- fix binary search algorithm to actually find the _lower bound_ in face of duplicates.
- prevent hfiles like the one indicated above from being created, in this case by extending
block 1 larger than the default sizing until we get a different key.

there might be other solutions too.

> HFile code review and refinement
> --------------------------------
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>         Attachments: HFile-v3.patch, HFile-v4.patch
> HFile is a good mimic of Google's SSTable file format. And we want HFile to become a
common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then provide fixed
patch after the review.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message