hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1321) Logical Optimizer: Merge cascading foreach
Date Thu, 01 Jul 2010 00:13:50 GMT

    [ https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884105#action_12884105

Xuefu Zhang commented on PIG-1321:

Here is the scope of this type of optimization:

1. two consecutive foreach statements.
2. the second foreach statement is a simple inner plan in which the ognly statement is a GENERATE
statement. In other words, the second foreach statement must be something like "FOREACH A

Optimization result:
The two foreach statement will be merged to one. The new foreach statement keeps the first
old foreach statement's inner plan with the new expressions for the GENERATE statement. These
new expressions are generated based on those in the second foreach generate statement, combined
with those in the first foreach generate statement. For instance, suppose we have the following
pig script:

A = load 'file.txt' as (a, b, c);
B = foreach A generate a+b as u, c-b as v;
C = foreach B generate $0+5, v;
dump C;

The optimized plan after merge-foreach optimization will be equivalent to the following pig

A = load 'file.txt' as (a, b, c);
C = foreach A generate a+b+5, c-b;
dump C;

Of course, first foreach can have any complex inner plan, which remains the same in the new
foreach statement.

Patch for this optimization is coming soon...

> Logical Optimizer: Merge cascading foreach
> ------------------------------------------
>                 Key: PIG-1321
>                 URL: https://issues.apache.org/jira/browse/PIG-1321
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Xuefu Zhang
> We can merge consecutive foreach statement.
> Eg:
> b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
> c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
> => c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message