hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1823) want InputFormat for bzip2 files
Date Fri, 31 Aug 2007 18:55:18 GMT
want InputFormat for bzip2 files

                 Key: HADOOP-1823
                 URL: https://issues.apache.org/jira/browse/HADOOP-1823
             Project: Hadoop
          Issue Type: New Feature
          Components: mapred
            Reporter: Doug Cutting

Unlike gzip, the bzip file format supports splitting.  Compression is by blocks (900k by default)
and blocks are separated by a synchronization marker (a 48-bit approximation of Pi).  This
would permit very large compressed files to be split into multiple map tasks, which is not
currently possible unless using a Hadoop-specific file format.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message