hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1824) want InputFormat for zip files
Date Tue, 22 Jan 2008 15:50:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561368#action_12561368
] 

Hadoop QA commented on HADOOP-1824:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12373681/ZipInputFormat.patch
against trunk revision r614192.

    @author +1.  The patch does not contain any @author tags.

    javadoc -1.  The javadoc tool appears to have generated  messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 2 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1673/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1673/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1673/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1673/console

This message is automatically generated.

> want InputFormat for zip files
> ------------------------------
>
>                 Key: HADOOP-1824
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1824
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Doug Cutting
>         Attachments: ZipInputFormat.patch
>
>
> HDFS is inefficient with large numbers of small files.  Thus one might pack many small
files into large, compressed, archives.  But, for efficient map-reduce operation, it is desireable
to be able to split inputs into smaller chunks, with one or more small original file per split.
 The zip format, unlike tar, permits enumeration of files in the archive without scanning
the entire archive.  Thus a zip InputFormat could efficiently permit splitting large archives
into splits that contain one or more archived files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message