hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-469) Support concatenated gzip and bzip2 files
Date Thu, 27 May 2010 03:33:46 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872077#action_12872077

Greg Roelofs commented on MAPREDUCE-469:

The bzip2 part reportedly is fixed on the trunk (HADOOP-4012); I haven't yet verified this
for myself, but I have no reason to believe it doesn't work.

I'm working on half of the gzip half, i.e., the native-libraries portion.  I appear to have
a working proof of concept, but my testing so far has been extremely minimal.  The java.util.zip
portion could be addressed with something similar to Duncan Loveday's MultiMemberGZIPInputStream
workaround (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425), but the license 
on his actual code is unclear.  (On the other hand, he has an Apache account and apparently
still works at BT, so it might be possible to get that clarified.)

Ravi, do you mind if I assign this issue to myself?

> Support concatenated gzip and bzip2 files
> -----------------------------------------
>                 Key: MAPREDUCE-469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tom White
>            Assignee: Ravi Gummadi
> When running MapReduce with concatenated gzip files as input only the first part is read,
which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message