hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-409) PERFORMANCE: Removing Union from map side of query with COGROUP
Date Thu, 04 Sep 2008 00:43:44 GMT
PERFORMANCE: Removing Union from map side of query with COGROUP
---------------------------------------------------------------

                 Key: PIG-409
                 URL: https://issues.apache.org/jira/browse/PIG-409
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich


Currently, the map side code is not aware which side of the cogroup it is processing so it
assumes that it processes all by putting a union at the end of the pipeline. This is fairly
inefficient.

A better approach would be to figure out which file is processed in confiugre call. There
seems to be away to do this with hadoop but it is not documented so might not be guaranteed
- need to follow up with somebody from hadoop project.

Another approach is to check it the first time map is called and to pick the execution plan
that matches that part.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message