hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Sat, 30 May 2009 02:13:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714656#action_12714656

Hong Tang commented on HADOOP-3315:

bq. Looking at the latest patch, I have one question: there are a lot of contained classes
and interfaces in tfile. Why are these all contained in one tfile class, instead of making
tfile a package and having the classes and interfaces contained in there?

Fair question, the code could be factored to make it easier to maintain. However, I am a bit
hesitant to split them into packages (tfile itself is already a package, adding more sub-packages
would probably be a bit overkill).

After examining the code, here are a few opportunities where we could split it out:
- Move out Interface RawComparable.
- Move out public class ByteArray
- Move out the exception classes: MetaBlockAlreadyExists, MetaBlockDoesNotExist
- Move out the code that dumps the meta info of TFile (possibly with a wrapper class called

I will start working on the above and feel free to comment on what more could be done.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Hong Tang
>             Fix For: 0.21.0
>         Attachments: hadoop-3315-0507.patch, hadoop-3315-0509-2.patch, hadoop-3315-0509.patch,
hadoop-3315-0513.patch, hadoop-3315-0514.patch, HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, hadoop-trunk-tfile.patch, TFile
Specification 20081217.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message