hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Wed, 24 Sep 2008 19:31:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634262#action_12634262
] 

stack commented on HADOOP-3315:
-------------------------------

To Owen's list I'd suggest adding amenable random-access and access to the block index by
subclasses.

@Jim.  MapFile is ill-suited to hbase use.  It needs to be replaced.  Discussions above are
about looking at TFile as possible foundation for replacement. For hbase log files, where
we need append, we can just keep on with the SequenceFile we use now.

@Hong

.bq ...is to write your own key appender and value appender classes...

This sounds fine for creating auxillary index information.  I then suggested advanceCursorInBlock
needed to be made accessible so I could then exploit the auxillary index at read time but
on study, I see this is the wrong place.  Where would you suggest I plugin to exploit my extra
index information at Read time?

Sorry if this need seems exotic but I think we can get away with casting this need under the
'Extensibility' TFile Design Principal.  In our application, keys are row/column/timestamp.
 If millions of columns in a row and we want to skip to the next row, we can't next-next-next
through the keys.  It'll be too slow.  We need to skip ahead to the new row. Block index won't
help in this regard.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch,
TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message