hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-422) cross is broken
Date Wed, 10 Sep 2008 18:15:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shravan Matthur Narayanamurthy updated PIG-422:
-----------------------------------------------

    Status: Patch Available  (was: Open)

This one got broken because of the fix to POUserFunc to adhere to trunk behavior. We removed
the Tuple inside a Tuple check. The initial fix used a constant expression which was a Tuple
and relied on POUserFunc to remove the nesting before sending it to GFCross. 

So now I split the list of objects inside the constant tuple into 2 constant expressions.
However, it did not work because of our unordered plan structure. It was accessing the two
constants in random order and GFCross would not work if we pass(1,2) instead of (2,1).

I think we need to be careful about this one. If a UDF is given constant expressions like
UDF('2','1'), We create constant expressions and attach it to the UDF as inputs. However,
I am not sure if there is guarantee that the two constant expressions will be pulled in the
same order as our plan doesn't support order.

I was able to fix this one because, luckily the POUserFunc operator relies on its inputs and
not on the ones got by using getPredecessors() on the plan. I think most of the operators
that were created earlier did that since we did not have a handle to the plan the operator
is a part of. So, I explicitly initialized the inputs of POUserFunc to the list of constanct
expressions, created in the right order, after connecting all the operators in the plan. I
think we need to take a look at the code and see if we can hit such problems elsewhere.

> cross is broken
> ---------------
>
>                 Key: PIG-422
>                 URL: https://issues.apache.org/jira/browse/PIG-422
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: 422.patch
>
>
> The following script fails:
> a = load 'data1' as (name, age, gpa);
> b = load 'data2' as (name, age, registration, contributions);
> c = filter a by age < 19 and gpa < 1.0;
> d = filter b by age < 19;
> e = cross c, d;
> store e into 'output';
> produces the following stack:
> 0808261638_3210_r_000000java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag
cannot be cast to org.apache.pig.data.Tuple
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:264)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:220)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:231)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:220)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct.getNext(PODistinct.java:76)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:270)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:351)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:158)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:123)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:175)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:241)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:217)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:156)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:206)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:176)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:87)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> /Cross
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:158)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:123)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:175)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:241)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:217)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:156)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:206)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:176)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:87)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message