hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Tue, 29 Apr 2008 17:07:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593051#action_12593051

Owen O'Malley commented on HADOOP-3315:

I like the record count being in the tail. If we are saving the first key of each block, we
should probably also save the last key of the file too.

I would think this is just a new class that people migrate too. For most applications, I would
expect it to be an easy transition.

Srikanth, it is useful to have the magic bytes at the front so that commands like "file" on
unix can work. It is just a constant 4 bytes, it doesn't really complicate the format at all.

You absolutely do want the record length in the data, because deserialization can be slow.

I absolutely don't think there should be any special code for fixed width types. The cost
of the variable width types is *really* small with the vint encoding.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message