hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Kakani (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Tue, 03 Jun 2008 23:40:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602114#action_12602114

Srikanth Kakani commented on HADOOP-3315:


There would be one complication in exposing the append(long keyLength, long valueLength) that
we did not discuss earlier. Although it can be handled.

If it the key,value is at the beginning of a block we need to copy to a byte array in the
key.serialize(outputstream). We can do this by having a keyValueOutputStream(keybytes,valuebytes,
outputstream), that captures the first keybytes of data written into a buffer. This needs
to be done to generate an index. But it starts getting ugly.

I would also suggest ObjectFile should be extending the TFile and it can do all this in a
neater fashion without exposing the append(keyLength, valueLength).

Additionally to make any of this feasible (You mentioned this earlier, I just want to record
it), serializers should also have getSerializedLength().

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf, TFile-2.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message