hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Wed, 04 Jun 2008 00:32:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602132#action_12602132

Owen O'Malley commented on HADOOP-3315:

Srikanth, I don't understand your concern. When the user calls append(long, long), the writer
can decide whether to start a new block or not based on the lengths. So as the client calls
write(byte[], int, int) on the output stream, it can be written directly to the file stream
or codec's ByteBuffer. For codecs like lzo, the write may be broken into multiple calls to
handle the required chunking.

And yes, to make this efficient, you need to be able to get the serialized length of the objects.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf, TFile-2.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message