hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Thu, 29 Jan 2009 20:28:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668588#action_12668588
] 

stack commented on HADOOP-3315:
-------------------------------

Hmm.  Looking at doing random accesses and it seems like a bunch of time is spent in inBlockAdvance
advancing sequentially through blocks rather than do something like a binary search to find
desired block location.  Also, as we advance, we create and destroy a bunch of objects such
as the stream to hold the value.  Can you comment on why this is (compression should be on
tfile block boundaries, right so nothing to stop hopping into the midst of a tfile)?  Thanks.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch,
hadoop-trunk-tfile.patch, hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message