hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Wed, 24 Sep 2008 17:41:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634214#action_12634214

Jim Kellerman commented on HADOOP-3315:

Doug Cutting - 24/Sep/08 10:04 AM
>> this jira is not supposed to be a replacement for MapFile
> That's unfortunate. I thought that was the point of including indexes in the file, rather
than in a side file.

>Owen O'Malley - 26/Apr/08 09:31 AM
>>    Is this a format just for compressed sequence files, or for all sequence files?
> The issue is most critical for compressed sequence files, but it would make sense to
make the
> compression optional. I would not support value compression.
>>    Is this intended as a replacement for MapFile too?
> yes

Ok, I guess my misunderstanding was due to Owen's comment.

Doug Cutting - 24/Sep/08 10:04 AM
> I wonder whether tfile should start out in contrib until it is more full-featured? Without
support for java
> comparators or random access it is not yet a replacement for SequenceFile. It also doesn't
yet have
> any inputformats, so it cannot be used from mapreduce. Nor does it yet have bindings
for other
> programming languages. So my preference is that, until tfile is proven to be of general
utility to
> Hadoop applications, it should live in contrib. We don't want code in core that's not
both widely
> usable and actually used.


Since TFile is not a replacement for MapFile and does not support reading up to the latest
sync while writing, we probably cannot use it for HBase.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch,
TFile Specification Final.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message