hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits
Date Thu, 29 Oct 2009 11:34:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771390#action_12771390
] 

Ankur commented on PIG-1060:
----------------------------

Here's a sample script to illustrate the issue. Note that sample data isn't very important
here since the optimization and execution fail. 
=== test.pig ====

data = LOAD 'dummy' as (name:chararray, freq:int);

filter1 = FILTER data BY freq < 5;
group1 = GROUP filter1 BY name;
proj1 = FOREACH group1 GENERATE FLATTEN(group), 'string1', SUM(filter1.freq);

filter2 = FILTER data by freq > 5;
group2 = GROUP filter2 BY name;
proj2 = FOREACH group2 GENERATE FLATTEN(group), 'string2', SUM(filter2.freq);

filter3 = FILTER filter2 by freq < 10;
group3 = GROUP filter3 By name;
proj3 = FOREACH group3 GENERATE FLATTEN(group), 'string3', SUM(filter3.freq);

filter4 = FILTER filter3 by freq > 7;
group4 = GROUP filter4 By name;
proj4 = FOREACH group4 GENERATE FLATTEN(group), 'string4', SUM(filter4.freq);

M1 = LIMIT proj1 10;
M2 = LIMIT proj2 10;
M3 = LIMIT proj3 10;
M4 = LIMIT proj4 10;

U = UNION M1, M2, M3, M4;

STORE U INTO 'res' USING PigStorage();

The dot output can dumped via command - "explain -dot -script test.pig;" to visualize the
scenario.
A surprising observation is that despite turning MultiQuery off using -M, it seems that the
MultiQuery optimizer is still runs and fails the script.




> MultiQuery optimization throws error for multi-level splits
> -----------------------------------------------------------
>
>                 Key: PIG-1060
>                 URL: https://issues.apache.org/jira/browse/PIG-1060
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Ankur
>
> Consider the following scenario :-
> 1. Multi-level splits in the map plan.
> 2. Each split branch further progressing across a local-global rearrange.
> 3. Output of each of these finally merged via a UNION.
> MultiQuery optimizer throws the following error in such a case:
> "ERROR 2146: Internal Error. Inconsistency in key index found during optimization."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message