hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-299) Filter operator not included in the main predecessor plan structure
Date Tue, 15 Jul 2008 23:03:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Santhosh Srinivasan updated PIG-299:
------------------------------------

    Attachment: nested_project_as_foreach.patch

The nested_project_as_foreach.patch contains the following:

1. The project statements like A = $1.$0;  B = A.($1, $2); C = A.$1; etc. are rewritten as
for each statements with nested plans that project the columns.
2. Unit test cases for testing the rewrite.

Unit test cases that still fail are:

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 142.518 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED

    [junit] Running org.apache.pig.test.TestFilterOpNumeric
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 246.872 sec
    [junit] Test org.apache.pig.test.TestFilterOpNumeric FAILED

    [junit] Running org.apache.pig.test.TestStoreOld
    [junit] Tests run: 3, Failures: 0, Errors: 2, Time elapsed: 21.584 sec
    [junit] Test org.apache.pig.test.TestStoreOld FAILED


> Filter operator not included in the main predecessor plan structure
> -------------------------------------------------------------------
>
>                 Key: PIG-299
>                 URL: https://issues.apache.org/jira/browse/PIG-299
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: N/A
>            Reporter: Tyson Condie
>            Assignee: Santhosh Srinivasan
>            Priority: Blocker
>             Fix For: types_branch
>
>         Attachments: nested_project_as_foreach.patch
>
>
> Take the following query, which can be found in TestLogicalPlanBuilder.java method testQuery80();
> a = load 'input1' as (name, age, gpa);
> b = filter a by age < '20';");
> c = group b by (name,age);
> d = foreach c {
>             cf = filter b by gpa < '3.0';
>             cp = cf.gpa;
>             cd = distinct cp;
>             co = order cd by gpa;
>             generate group, flatten(co);
>             };
> The filter statement 'cf = filter b by gpa < '3.0'' is not accessible via the LogicalPlan::getPredecessor
method. Here is the explan plan print out of the inner foreach plan:
> |---SORT Test-Plan-Builder-17 Schema: {gpa: bytearray} Type: bag
>     |   |
>     |   Project Test-Plan-Builder-16 Projections: [0] Overloaded: false FieldSchema:
gpa: bytearray cn: 2 Type: bytearray
>     |   Input: Distinct Test-Plan-Builder-1
>     |
>     |---Distinct Test-Plan-Builder-15 Schema: {gpa: bytearray} Type: bag
>         |
>         |---Project Test-Plan-Builder-14 Projections: [2] Overloaded: false FieldSchema:
gpa: bytearray cn: 2 Type: bytearray
>             Input: Project Test-Plan-Builder-13 Projections:  [*]  Overloaded: false|
>             |---Project Test-Plan-Builder-13 Projections:  [*]  Overloaded: false FieldSchema:
cf: tuple({name: bytearray,age: bytearray,gpa: bytearray}) Type: tuple
>                 Input: Filter Test-Plan-Builder-12OPERATOR PROJECT SCHEMA {name: bytearray,age:
bytearray,gpa: bytearray}
> As you can see the filter is only accessible via the LOProject::getExpression() method.
It is not showing up as an input operator. Focus on the projection immediately following the
filter. If I remove this projection then I get a correct plan. For example, let the inner
foreach plan be as follows:
> d = foreach c {
>             cf = filter b by gpa < '3.0';
>             cd = distinct cf;
>             co = order cd by gpa;
>             generate group, flatten(co);
>             };
> Then I get the following (correct) explan plan output.
> |---SORT Test-Plan-Builder-15 Schema: {name: bytearray,age: bytearray,gpa: bytearray}
Type: bag
>     |   |
>     |   Project Test-Plan-Builder-14 Projections: [2] Overloaded: false FieldSchema:
gpa: bytearray cn: 2 Type: bytearray
>     |   Input: Distinct Test-Plan-Builder-1
>     |
>     |---Distinct Test-Plan-Builder-13 Schema: {name: bytearray,age: bytearray,gpa: bytearray}
Type: bag
>         |
>         |---Filter Test-Plan-Builder-12 Schema: {name: bytearray,age: bytearray,gpa:
bytearray} Type: bag
>             |   |
>             |   LesserThan Test-Plan-Builder-11 FieldSchema: null Type: Unknown
>             |   |
>             |   |---Project Test-Plan-Builder-9 Projections: [2] Overloaded: false FieldSchema:
 Type: Unknown
>             |   |   Input: CoGroup Test-Plan-Builder-7
>             |   |
>             |   |---Const Test-Plan-Builder-10 FieldSchema: chararray Type: chararray
>             |
>             |---Project Test-Plan-Builder-8 Projections: [1] Overloaded: false FieldSchema:
b: bag({name: bytearray,age: bytearray,gpa: bytearray}) Type: bag
>                 Input: CoGroup Test-Plan-Builder-7OPERATOR PROJECT SCHEMA {name: bytearray,age:
bytearray,gpa: bytearray}
> Alan said that the problem is we don't generate a foreach operator for the 'cp = cf.gpa'
statement. Please let me know if this can be resolved.
> Thanks,
> Tyson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message