hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Wed, 10 Sep 2008 18:29:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629902#action_12629902

Doug Cutting commented on HADOOP-3315:

> There is little to be shared between the two except for the buffer and count definition.

If you instead extended ByteArrayOutputStream, BoundedByteArrayOutputStream would only need
to override the two write() methods, rather than implement all of the methods.

> The new VLong format enlarges the range of integers that can be encoded with 2-4 bytes
[ ... ]

Then it should be named something different.  Why support negative values at all if you don't
use them?  Lucene also defines a VInt format that might be considered.  Personally, I'd prefer
Hadoop used a single VInt format and I don't think it is worth defining yet another VInt and
String format for TFile is wise.

> we did not directly use CompressionCodecFactory is because CompressionCodecFactory.getCodec()
expects a path

Then CompressionCodecFactory should be extended, rather than duplicated.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_TFILE_PREVIEW.patch, HADOOP-3315_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
TFile Specification Final.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message