hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1518) multi file input format for loaders
Date Fri, 27 Aug 2010 18:08:55 GMT

    [ https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903501#action_12903501
] 

Olga Natkovich commented on PIG-1518:
-------------------------------------

After discussion with Ashutosh and Yan tha agreement is that in addition to checking interfaces
we also need to check if we are taking advantage of the loader properties before deciding
whether to combine or not.

For instance, even if the loader implements OrderLoadFunc but there is no merge join in the
script, we can still combine.

Yan, please, compile the list of valid combinations and update the patch, thanks.

> multi file input format for loaders
> -----------------------------------
>
>                 Key: PIG-1518
>                 URL: https://issues.apache.org/jira/browse/PIG-1518
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch,
PIG-1518.patch, PIG-1518.patch, PIG-1518.patch, PIG-1518.patch
>
>
> We frequently run in the situation where Pig needs to deal with small files in the input.
In this case a separate map is created for each file which could be very inefficient. 
> It would be greate to have an umbrella input format that can take multiple files and
use them in a single split. We would like to see this working with different data formats
if possible.
> There are already a couple of input formats doing similar thing: MultifileInputFormat
as well as CombinedInputFormat; howevere, neither works with ne Hadoop 20 API. 
> We at least want to do a feasibility study for Pig 0.8.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message