hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1824) want InputFormat for zip files
Date Fri, 25 Jan 2008 15:57:36 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562547#action_12562547

Ankur commented on HADOOP-1824:

Some questions.
1. How is a  java.io.InputStream passed and used in native code. The header file represents
it as a jobject which I tried casting to FILE * and reading, it did not work as expected.

2. Can a native method call return structures that can be converted to java objects ? If so
how ?
   Basically I want to be able to return an array of C structure where each element holds
the following information
                 - The path of the entry
                 - The number of the entry
                 - Offset of the entry in the zip file
So that this info can be converted to an array of ZipSplit.

I am new to JNI so things are less than obvious for me, a little help will be greatly appreciated
on JNI.

> want InputFormat for zip files
> ------------------------------
>                 Key: HADOOP-1824
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1824
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Doug Cutting
>         Attachments: ZipInputFormat_fixed.patch
> HDFS is inefficient with large numbers of small files.  Thus one might pack many small
files into large, compressed, archives.  But, for efficient map-reduce operation, it is desireable
to be able to split inputs into smaller chunks, with one or more small original file per split.
 The zip format, unlike tar, permits enumeration of files in the archive without scanning
the entire archive.  Thus a zip InputFormat could efficiently permit splitting large archives
into splits that contain one or more archived files.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message