hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3315) New binary file format
Date Tue, 03 Feb 2009 06:39:59 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HADOOP-3315:

    Attachment: hfile2.patch

More stripping.  This patch has HFile sort of working again (Its a hackup with ugly byte array
copies that we need to remove).  I was able to do some basic performance comparisons.  If
buffer size is 4k, then I can random access 10 byte cells as fast a MapFile.  If cells are
bigger, HFile outperforms MapFile; e.g. if cell is 100 bytes, HFile is 2x MapFile (These are
extremely coarse tests going against local filesystem).

Need to do more stripping.  In particular implement Ryan Rawson idea of carrying HFile block
in an nio ByteBuffer giving out new ByteBuffer 'views' when a key or value is asked for rather
than copy byte arrays.

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Hong Tang
>             Fix For: 0.21.0
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch,
hadoop-trunk-tfile.patch, hadoop-trunk-tfile.patch, hfile2.patch, TFile Specification 20081217.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message