hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Mon, 12 May 2008 17:54:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596142#action_12596142
] 

Doug Cutting commented on HADOOP-3315:
--------------------------------------

Latest draft looks much better!
 - Does RO stand for something, or is it short for "row"?
 - The RO entry values can be more compactly represented as differences from the prior entry.
 Is this intended?  If so, we should state this.
 - In data blocks, we might use something like <entryLength><keyLength><key><value>.
 This would permit one to skip entire entries more quickly.  The valueLength can be computed
as entryLength-keyLength.  Do folks think this is worthwhile?

> We should not depend on keys/values being Writables in TFile.

Good point.  So the writer's constructor should have Serlializer<K> and Serializer <V>
parameters, and the reader Deserializer<K> and Deserializer<V> parameters.  This
will permit us to, e.g., store Thrift or other objects in a TFile.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message