hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (PIG-430) Projections in nested filter and inside foreach do not work
Date Wed, 17 Sep 2008 20:27:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631924#action_12631924
] 

shravanmn edited comment on PIG-430 at 9/17/08 1:26 PM:
-----------------------------------------------------------------------------

I have fixed part of the problem that addresses the project issue. The issue mentioned in
distinct still remains. The problem here is that we see that projects are being introduced
into the input of distinct which creates a unique case where the projection chaining will
not work. The problem is similar to the one where you can assign a nested project to a variable
inside a nested block. This has been solved by replacing the nested project with a foreach
statement. The solution to the distinct problem should be something similar where the input
to the distinct can also be a nested project. I made a local change by replacing BaseEvalSpec
by NestedProject in my code for this and it works. However, I don't want to mess up something
because I am not completely aware of the side-effects of changing this in the parser. Its
better if someone more comfortable with the parser took a look at this one.

Also, I think there are some issues with the parsing of nested things. I tried the following
and the parser just doesn't terminate the nested block waiting and keeps waiting for more
input:

A = load 'file';
B = group A by $0;
C = foreach B { C1=distinct "const"; generate C1;};

I was clueless as  to why this is happening but I tried this because I thought that the input
to a nested distinct shouldn't be BaseEvalSpec which can FuncEvalSpecs and Constants. I think
we need to change things a bit here.

      was (Author: shravanmn):
    I have fixed part of the problem that addresses the project issue. The issue mentioned
in distinct still remains. The problem here is that we see that projects are being introduced
into the input of distinct which creates a unique case where the projection chaining will
not work. The problem is similar to the one where you can assign a nested project to a variable
inside a nested block. This has been solved by replacing the nested project with a foreach
statement. The solution to the distinct problem should be something similar where the input
to the distinct can also be a nested project. I made a local change by replacing BaseEvalSpec
by NestedProject in my code for this and it works. However, I don't want to mess up something
because I am not completely aware of the side-effects of changing this in the parser. Its
better if someone more comfortable with the parser took a look at this one.

Also, I think there are some issues with the parsing of nested things. I tried the following
and the parser just doesn't terminate the nested block waiting and keeps waiting for more
input:

A = load 'file';
B = group A by $0;
C = foreach B { C1=distinct "const"; generate C1;}

I was clueless as  to why this is happening but I tried this because I thought that the input
to a nested distinct shouldn't be BaseEvalSpec which can FuncEvalSpecs and Constants. I think
we need to change things a bit here.
  
> Projections in nested filter and inside foreach do not work
> -----------------------------------------------------------
>
>                 Key: PIG-430
>                 URL: https://issues.apache.org/jira/browse/PIG-430
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: 430-1.patch
>
>
> The following queries do not work:
> Nested filter:
> a = load 'studenttab10k' as (name, age, gpa);
> b = filter a by age < 20;
> c = group b by age;
> d = foreach c { cf = filter b by gpa < 3.0; cp = cf.gpa; cd = distinct cp; co = order
cd by $0; generate group, flatten(co); }
> store d into 'output';
> Nested Distinct:
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> b = group a by name;
> c = foreach b { aa = distinct a.age; generate group, COUNT(aa); }
> store c into 'output';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message