hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-627) PERFORMANCE: multi-query optimization
Date Wed, 21 Jan 2009 00:39:59 GMT
PERFORMANCE: multi-query optimization

                 Key: PIG-627
                 URL: https://issues.apache.org/jira/browse/PIG-627
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Olga Natkovich
             Fix For: types_branch

Currently, if your Pig script contains multiple stores and some shared computation, Pig will
execute several independent queries. For instance:

A = load 'data' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group B by b;
store C into 'output2';

This script will result in map-only job that generated output1 followed by a map-reduce job
that generated output2. As the resuld data is read, parsed and filetered twice which is unnecessary
and costly. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message