hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-469) Support concatenated gzip and bzip2 files
Date Tue, 22 Jun 2010 04:03:15 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Greg Roelofs updated MAPREDUCE-469:

    Attachment: MR-469.v2.yahoo-0.20.2xx-branch.patch

Expanded test coverage uncovered a bug on Friday, and trunk update today has breakage, so
this version is against Yahoo's 0.20S+ branch.

Still not quite final; I haven't finished updating the unit test to exercise both native and
built-in gzip and built-in bzip2 at multiple buffer sizes, and I've left some (mostly) commented-out
debug statements in place in case that turns up anything further.

Reviewer questions:
 - Currently the new BuiltInGzipDecompressor class inherits directly from JDK Inflater, but
I suspect I should extend BuiltInZlibInflater instead.
 - Is it worthwhile to encapsulate the state label and associated variables into a private
inner class (BuiltInGzipDecompressor.java, first FIXME comment)?

The other FIXMEs are either related to the two items above or else are largely unrelated to
this issue.

> Support concatenated gzip and bzip2 files
> -----------------------------------------
>                 Key: MAPREDUCE-469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tom White
>            Assignee: Greg Roelofs
>         Attachments: grr-hadoop-common.dif.20100614c, grr-hadoop-mapreduce.dif.20100614c,
> When running MapReduce with concatenated gzip files as input only the first part is read,
which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message