pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-582) POProject assumes that when "overloaded" field is true, the type of input is BAG
Date Mon, 29 Dec 2008 21:48:46 GMT
POProject assumes that when "overloaded" field is true, the type of input is BAG

                 Key: PIG-582
                 URL: https://issues.apache.org/jira/browse/PIG-582
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch

POProject.getNext(Tuple) checks if the "overloaded" field is true, and if so casts the input
it obtained to a Bag. The "overloaded" field is set based on the value of the overloaded field
in LOProject in LogToPhyTranslator. LOProject sets overloaded to true when it has a non ExpressionOperator
successor. This does not strictly mean that the type of the input got by POProject is a Bag
and POProject should not assume so. In most real word scripts, the Project which has its overloaded
set to true does have input of type Bag (since typically the input to the project is a relation
which is a BAG). However, here is a contrived example where things go wrong:
a = load 'distinct.input' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b  {
        l = distinct group;
        generate l;};
explain c;
dump c;

This  script on execution gives the following exception because of the assumption of the input
type in POProject as described:
2008-12-29 13:45:55,537 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (reduce) task_200812151518_4209_r_000000java.lang.ClassCastException:
java.lang.String cannot be cast to org.apache.pig.data.DataBag
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:272)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct.getNext(PODistinct.java:77)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:276)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:368)


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message