hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4652) RAgzip: multiple map tasks for a large gzipped file
Date Fri, 21 Nov 2008 09:36:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649633#action_12649633
] 

Hadoop QA commented on HADOOP-4652:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12394147/HADOOP-4652.path
  against trunk revision 719431.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/console

This message is automatically generated.

> RAgzip: multiple map tasks for a large gzipped file
> ---------------------------------------------------
>
>                 Key: HADOOP-4652
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4652
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io, mapred, native
>    Affects Versions: 0.20.0
>            Reporter: Daehyun Kim
>         Attachments: HADOOP-4652.path
>
>
> Currently, the hadoop processes gzipped files with only one map.
> We have made a patch that enables multiple map tasks for one large gzipped file. We call
the patch RAgzip.
> To process multiple map tasks for gzipped file, you may use RAgzip by just changing InputFormat
to RAGZIPInputFormat.
> The option used in RAGZIPInputFormat can be found at the javadoc of RAGZIPInputFormat
part.
> RAgzip uses zlib's inflatePrime function which supports random access on a gzipped file.

> Since the inflatePrime is supported from the version of 1.2.2.4, it requires zlib 1.2.2.4
or higher. (We tested on zlib 1.2.3)
> RAgzip requires the preprocessing step that creates an access point (.ap) file, which
is like the index of the gzipped file chunks. 
> The access point(.ap) file is located in same path of the gzipped file.
> If there is a "/user/hadoop/test.gz", the .ap file is created with "/user/hadoop/test.gz.ap".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message