pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (PIG-2395) Pig doesn't do early projection for some scripts
Date Sun, 04 Dec 2011 06:32:40 GMT

     [ https://issues.apache.org/jira/browse/PIG-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daniel Dai resolved PIG-2395.

    Resolution: Duplicate

This is a duplication of PIG-1324. It involves some significant rework of column prune optimizer.
> Pig doesn't do early projection for some scripts
> ------------------------------------------------
>                 Key: PIG-2395
>                 URL: https://issues.apache.org/jira/browse/PIG-2395
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.1
>         Environment: Linux
>            Reporter: Jie Li
> For some scripts Pig doesn't do early projection, e.g. dropping non-necessary fields
as soon as possible. This is observed in two ways: 1) the output doesn't contain INFO like
"ColumnPruneVisitor - Columns pruned for xxx: $0, $1"; 2) the job has as much or more local
> One example where Pig should figure out A's fields c~n can be dropped before the COGROUP.
> A = load '/tmp/A' USING PigStorage('|') as (a,b,c,d,e,f,g,h,i,j,k,l,m,n);
> B = load '/tmp/B' USING PigStorage('|') as (a);
> COG = cogroup A by a, B by a;
> out = foreach COG generate SUM(A.b) as sum;
> store out into '/tmp/out' USING PigStorage('|');
> Another similar example involves a GROUP operator.
> While Pig is able and assumed to do early projection in most cases, this inconsistency
hurts the performance badly. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message