pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Li (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2395) Pig doesn't do early projection for some scripts
Date Sun, 04 Dec 2011 06:06:40 GMT
Pig doesn't do early projection for some scripts

                 Key: PIG-2395
                 URL: https://issues.apache.org/jira/browse/PIG-2395
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.9.1
         Environment: Linux
            Reporter: Jie Li

For some scripts Pig doesn't do early projection, e.g. dropping non-necessary fields as soon
as possible. This is observed in two ways: 1) the output doesn't contain INFO like "ColumnPruneVisitor
- Columns pruned for xxx: $0, $1"; 2) the job has as much or more local IO (see FILE_BYTES_READ

One example where Pig should figure out A's fields c~n can be dropped before the COGROUP.

A = load '/tmp/A' USING PigStorage('|') as (a,b,c,d,e,f,g,h,i,j,k,l,m,n);
B = load '/tmp/B' USING PigStorage('|') as (a);
COG = cogroup A by a, B by a;
out = foreach COG generate SUM(A.b) as sum;
store out into '/tmp/out' USING PigStorage('|');

Another similar example involves a GROUP operator.

While Pig is able and assumed to do early projection in most cases, this inconsistency hurts
the performance badly. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message