hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits
Date Thu, 05 Nov 2009 02:01:32 GMT

    [ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773744#action_12773744

Viraj Bhat commented on PIG-1060:

Hi Ankur and Richard,
 I have a script which demonstrates a similar problem, but can be solved by using the -M option.
This script can reproduce the problem even without the UNION operator , but it has  properties
1 and 2 of the original problem description.

Try commenting out the F alias. It works fine.


ORGINALDATA = load '/user/viraj/somedata.txt' using PigStorage() as (col1, col2, col3, col4,
col5, col6, col7, col8);

--Check data

A = foreach ORGINALDATA generate col1, col2, col3, col4, col5, col6;

B = group A all;

C = foreach B generate COUNT(A);

store C into '/user/viraj/result1';

D = filter A by (col1 == col2) or (col1 == col3);

E = group D all;

F = foreach E generate COUNT(D);

--try commenting F
store F into '/user/viraj/result2';

G = filter D by (col4 == col5) ;

H = group G all;

I = foreach H generate COUNT(G);

store I into '/user/viraj/result3';

J = filter G by (((col6 == 'm') or (col6 == 'M')) and (col6 == 1)) or (((col6 == 'f') or (col6
== 'F')) and (col6 == 0)) or ((col6 == '') and (col6 == -1));

K = group J all;

L = foreach K generate COUNT(J);

store L into '/user/viraj/result4';


> MultiQuery optimization throws error for multi-level splits
> -----------------------------------------------------------
>                 Key: PIG-1060
>                 URL: https://issues.apache.org/jira/browse/PIG-1060
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Ankur
>            Assignee: Richard Ding
> Consider the following scenario :-
> 1. Multi-level splits in the map plan.
> 2. Each split branch further progressing across a local-global rearrange.
> 3. Output of each of these finally merged via a UNION.
> MultiQuery optimizer throws the following error in such a case:
> "ERROR 2146: Internal Error. Inconsistency in key index found during optimization."

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message