hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Wed, 24 Sep 2008 17:05:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634201#action_12634201
] 

Doug Cutting commented on HADOOP-3315:
--------------------------------------

> this jira is not supposed to be a replacement for MapFile

That's unfortunate.  I thought that was the point of including indexes in the file, rather
than in a side file.

I wonder whether tfile should start out in contrib until it is more full-featured?  Without
support for java comparators or random access it is not yet a replacement for SequenceFile.
 It also doesn't yet have any inputformats, so it cannot be used from mapreduce.  Nor does
it yet have bindings for other programming languages.  So my preference is that, until tfile
is proven to be of general utility to Hadoop applications, it should live in contrib.  We
don't want code in core that's not both widely usable and actually used.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch,
TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message