hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Fri, 12 Sep 2008 17:41:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630637#action_12630637

Pete Wyckoff commented on HADOOP-3315:

bq. Admittedly, we cannot claim language neutrality 

If you really want this as a top-level goal, I really think you could define the TFileHeaders
as a struct in something like protocol buffers or Thrift and then no C++, Python, Perl, ...
implementor ever has to worry about data format.  This is probably a day or two of work and
in the long run should really pay off.

If one didn't need to worry about format and ordering of headers, how much easier is it to
implement the read side of TFiles which is often the first thing you want.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_TFILE_PREVIEW.patch, HADOOP-3315_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
TFile Specification Final.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message