hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6835) Support concatenated gzip and bzip2 files
Date Thu, 01 Jul 2010 01:27:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884120#action_12884120
] 

Greg Roelofs commented on HADOOP-6835:
--------------------------------------

OK, so auto-patch doesn't know about the split projects, and hadoop-mapreduce patches don't
work in hadoop-common.

I tried moving the test case to hadoop-common/src/test/core/org/apache/hadoop/io/compress,
but it depends on too many MR classes to be workable (JobConf, Reporter, RecordReader, FileSplit,
InputSplit, FileInputFormat, TextInputFormat).  So I guess the procedure is (1) reupload the
hadoop-common patch and mark as patch-available [actually, I missed a fix in TestCodec, so
I need to upload a new one anyway]; (2) get that past QA and checked in; (3) open a separate
MR JIRA, attach the test case, and mark it patch-available; and (4) get _that_ past QA and
checked in, too.

If there's an easier way, feel free to enlighten me.  (On a related note, I was told by Reliable
Sources to blame Owen, but he just left town.  Coincidence?)


> Support concatenated gzip and bzip2 files
> -----------------------------------------
>
>                 Key: HADOOP-6835
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6835
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.2
>            Reporter: Tom White
>            Assignee: Greg Roelofs
>             Fix For: 0.22.0
>
>         Attachments: grr-hadoop-common.dif.20100614c, grr-hadoop-mapreduce.dif.20100614c,
HADOOP-6835.v3.yahoo-0.20.2xx-branch.patch, HADOOP-6835.v4.trunk-hadoop-common.patch, HADOOP-6835.v4.trunk-hadoop-mapreduce.patch,
HADOOP-6835.v4.yahoo-0.20.2xx-branch.patch, MR-469.v2.yahoo-0.20.2xx-branch.patch
>
>
> When running MapReduce with concatenated gzip files as input only the first part is read,
which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message