hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1823) want InputFormat for bzip2 files
Date Wed, 04 Jun 2008 18:34:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602397#action_12602397

Doug Cutting commented on HADOOP-1823:

Has an issue been filed in Ant's bugzilla yet to add this feature there?  We should only fork
the ant code short-term.

> want InputFormat for bzip2 files
> --------------------------------
>                 Key: HADOOP-1823
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1823
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Doug Cutting
>         Attachments: bzip2.jar
> Unlike gzip, the bzip file format supports splitting.  Compression is by blocks (900k
by default) and blocks are separated by a synchronization marker (a 48-bit approximation of
Pi).  This would permit very large compressed files to be split into multiple map tasks, which
is not currently possible unless using a Hadoop-specific file format.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message