hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-409) PERFORMANCE: Removing Union from map side of query with COGROUP
Date Thu, 04 Sep 2008 00:49:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Olga Natkovich updated PIG-409:

        Fix Version/s: types_branch
    Affects Version/s: types_branch

> PERFORMANCE: Removing Union from map side of query with COGROUP
> ---------------------------------------------------------------
>                 Key: PIG-409
>                 URL: https://issues.apache.org/jira/browse/PIG-409
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
> Currently, the map side code is not aware which side of the cogroup it is processing
so it assumes that it processes all by putting a union at the end of the pipeline. This is
fairly inefficient.
> A better approach would be to figure out which file is processed in confiugre call. There
seems to be away to do this with hadoop but it is not documented so might not be guaranteed
- need to follow up with somebody from hadoop project.
> Another approach is to check it the first time map is called and to pick the execution
plan that matches that part.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message