hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (PIG-466) PERFORMANCE: dropping the columns as soon as possible
Date Tue, 11 May 2010 18:41:42 GMT

     [ https://issues.apache.org/jira/browse/PIG-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai reopened PIG-466:
----------------------------


PIG-922 partially solve this issue by pushing columns to the loader. However, we can go beyond
that. For example:

{code}
a = load '1.txt' as (a0, a1, a2, a3);
b = filter a by a2==1;
c = order b by a1;
d = foreach c generate a0, a1;
{code}

PIG-922 is able to figure out a3 is not needed in the script and don't load it. One step further,
we can figure out a2 is no longer needed after b, so we can add a foreach and drop a2 after
b. This is not covered by PIG-922 and is part of new optimizer work.

> PERFORMANCE: dropping the columns as soon as possible
> -----------------------------------------------------
>
>                 Key: PIG-466
>                 URL: https://issues.apache.org/jira/browse/PIG-466
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>
> Currently, each operator carries all the data until foreach is encountered. This can
cause significant performance degradation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message