hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Thu, 11 Sep 2008 22:45:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630414#action_12630414
] 

Pete Wyckoff commented on HADOOP-3315:
--------------------------------------

bq. Additionally, protocol buffer's decoding requires you to read byte after byte, while both
WritableUtils and my VLong can detect the length of the whole encoding after the first byte.

InputStreams are buffered and also isn't the case you are optimizing for  mainly the 2 byte
case in which case this doesn' t help? Even if you don't want to pull in serialization from
thrift or protocol buffers, wouldn't using RecordIO's libraries make more sense to doug's
point?



> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_TFILE_PREVIEW.patch, HADOOP-3315_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message