Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 9330 invoked from network); 6 Feb 2008 12:11:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Feb 2008 12:11:32 -0000 Received: (qmail 96188 invoked by uid 500); 6 Feb 2008 12:11:24 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 96166 invoked by uid 500); 6 Feb 2008 12:11:24 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 96157 invoked by uid 99); 6 Feb 2008 12:11:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2008 04:11:23 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2008 12:11:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AC0EC714079 for ; Wed, 6 Feb 2008 04:11:08 -0800 (PST) Message-ID: <22014095.1202299868692.JavaMail.jira@brutus> Date: Wed, 6 Feb 2008 04:11:08 -0800 (PST) From: "Ankur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-1824) want InputFormat for zip files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566104#action_12566104 ] Ankur commented on HADOOP-1824: ------------------------------- Also it would be nice and I shall be thankful if you can recommend other bugs/issues that I can fix to make useful contributions :-) > want InputFormat for zip files > ------------------------------ > > Key: HADOOP-1824 > URL: https://issues.apache.org/jira/browse/HADOOP-1824 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.15.2 > Reporter: Doug Cutting > Attachments: ZipInputFormat_fixed.patch > > > HDFS is inefficient with large numbers of small files. Thus one might pack many small files into large, compressed, archives. But, for efficient map-reduce operation, it is desireable to be able to split inputs into smaller chunks, with one or more small original file per split. The zip format, unlike tar, permits enumeration of files in the archive without scanning the entire archive. Thus a zip InputFormat could efficiently permit splitting large archives into splits that contain one or more archived files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.