hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3857) Change the HFile Format
Date Thu, 05 May 2011 22:21:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029615#comment-13029615
] 

stack commented on HBASE-3857:
------------------------------

Design looks excellent.

A few comments:

+ It looks like it will be self-migrating in that it can read version1 hfiles.  Thats great.
+ You say "Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records"  Is this
the case?  The magic was supposed to be a sequence you could search to pick up the parse again
after hitting a bad patch of corrupted data.  You seem to instead start blocks with a type?
+ How are blocks sized now?  Are we still cutting blocks off at first KV boundary after we
go past configured hfile block size -- e.g. 64k -- or instead, is the block cutoff instead
determined by fill of the bloom filter array or the index?
+ I think I know what the following refers to in the diagram, "Version!2!root index,!stored!in!the!data!block!index!section!of!the!file"
-- its kept in the 'load-on-open section', right?
+ Can we have example of how root, intermediate and leaf indices interrelate?  Whats in the
root, intermediates, and leaf indices?  Are intermediates optional?  At what boundary do they
cut in?  Leaf indices are optional too?  What are these? indices into the data block?
+ • Offset!(long)!
o For this description "This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
• On?disk!size!(int)!
• Key!(a!serialized!byte!array)!
o Key!(VInt)!
o Key!bytes"

You are using vint specifying key size.  We didn't do that in v1?  You have a good implementation
(was costly IIRC using hadoops').

+ Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
+ "• entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)"

Is this worth the bother?  A binary search of in-memory data structure?  How many entries
are you thinking there will be in these blocks?

 
+1



> Change the HFile Format
> -----------------------
>
>                 Key: HBASE-3857
>                 URL: https://issues.apache.org/jira/browse/HBASE-3857
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Liyin Tang
>            Assignee: Mikhail Bautin
>         Attachments: hfile_format_v2_design_draft_0.1.pdf
>
>
> In order to support HBASE-3763 and HBASE-3856, we need to change the format of the HFile.
The new format proposal is attached here. Thanks for Mikhail Bautin for the documentation.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message