pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1518) multi file input format for loaders
Date Thu, 09 Sep 2010 22:36:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907805#action_12907805
] 

Olga Natkovich commented on PIG-1518:
-------------------------------------

Hi Justin, thanks for the patch!

I don't think we can commit it to 0.7 patch because we have already done the official 0.7
release and we can't introduce non-backward compatible changes to this branch.

However, I think it is great to have the patch on the JIRA so that anybody who is interested
in this patch can apply it to their own tree and run with it. We have done similar things
in the past (with hadoop versions) and it worked fine.

> multi file input format for loaders
> -----------------------------------
>
>                 Key: PIG-1518
>                 URL: https://issues.apache.org/jira/browse/PIG-1518
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: PIG-1518-0.7.0.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch,
PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch
>
>
> We frequently run in the situation where Pig needs to deal with small files in the input.
In this case a separate map is created for each file which could be very inefficient. 
> It would be greate to have an umbrella input format that can take multiple files and
use them in a single split. We would like to see this working with different data formats
if possible.
> There are already a couple of input formats doing similar thing: MultifileInputFormat
as well as CombinedInputFormat; howevere, neither works with ne Hadoop 20 API. 
> We at least want to do a feasibility study for Pig 0.8.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message