hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1518) multi file input format for loaders
Date Mon, 26 Jul 2010 20:55:17 GMT
multi file input format for loaders

                 Key: PIG-1518
                 URL: https://issues.apache.org/jira/browse/PIG-1518
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich
            Assignee: Yan Zhou
             Fix For: 0.8.0

We frequently run in the situation where Pig needs to deal with small files in the input.
In this case a separate map is created for each file which could be very inefficient. 

It would be greate to have an umbrella input format that can take multiple files and use them
in a single split. We would like to see this working with different data formats if possible.

There are already a couple of input formats doing similar thing: MultifileInputFormat as well
as CombinedInputFormat; howevere, neither works with ne Hadoop 20 API. 

We at least want to do a feasibility study for Pig 0.8.0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message