hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2922) sequencefiles without keys
Date Sun, 02 Mar 2008 07:34:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574195#action_12574195
] 

Owen O'Malley commented on HADOOP-2922:
---------------------------------------

Is this actually a serious issue? The overhead of using NullWritables as your key should be
4 bytes/record without block compression and far less with it. It might make sense to special
case SequenceFiles to not actually encode the NullWritables in each record. Is the extra space
an observed problem or is it just an abstract complaint?

> sequencefiles without keys
> --------------------------
>
>                 Key: HADOOP-2922
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2922
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.16.0
>            Reporter: Joydeep Sen Sarma
>
> sequencefiles are invaluable for storing compressed/binary data. but when we use them
to store serialized records - we don't use the key part at all (just put something dummy there
to satisfy the api). i have heard of other projects using the same tactics (jaql/cascading).
> so this is a request to have a modified version of sequencefiles that don't incur the
space and compute overhead of processing/storing these dummy keys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message