hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Sat, 26 Apr 2008 17:19:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592586#action_12592586
] 

Jim Kellerman commented on HADOOP-3315:
---------------------------------------

> Owen O'Malley - 26/Apr/08 09:46 AM
> One other possibility would be to represent key/values as:
{code}
record length (vint)
key
value
{code}
> That would work for all writables, because they all record their sizes internally. In
theory you could drop
> the record length, but that would mean that you would have to deserialize all of the
keys and values as
> you skip over records.
>
> Thoughts?

Dropping the record length would seriously slow down random reads unless the index is 'complete',
i.e., every key/offset is represented. If the index is sparse like MapFile's, you would only
get an approximate location of the desired record and then have to do a lot of work to seek
forward to the desired one.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message