hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Fri, 09 May 2008 21:49:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595734#action_12595734

Doug Cutting commented on HADOOP-3315:

I think for at least the first version we should assume that the index fits in memory.

As a subsequent enhancement, we might permit one to limit the size of the index.  Then, when
writing, if the index gets too big, we can downsample at that point, discarding every-other
entry or somesuch, to keep the index within a certain bound.  Similarly, when the file is
opened, if the index is larger than available memory, we can downsample further then.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message