hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-616) Casts to complex types do not work as expected
Date Mon, 12 Jan 2009 19:42:00 GMT
Casts to complex types do not work as expected
----------------------------------------------

                 Key: PIG-616
                 URL: https://issues.apache.org/jira/browse/PIG-616
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
             Fix For: types_branch


When we specify a (complex) type as a column in Pig, the TypeCastInserter inserts the appropriate
cast for the (complex) type. However, in the implementation of POCast.java, when databyte
arrays are converted to the (complex) types, we invoke the bytesToXXX method. 

For complex types, especially tuples and bags, we do not enforce the typing information specified
by the user in the AS clause or with the explicit cast statement. The implementation solely
relies on bytesToXXX to figure out the right type.

An example of a query that fails is given below. Wrt the query, the data is a single column
that is a bag of integers. The user specifies this bag to be a bag of chararray. This conversion
is allowed in pig but the implementation does not perform the actual cast. Instead the bytesToBag
is called on the stream. The resulting type is a bag of integers and not a bag of chararray.
In the subsequent statement the user (correctly) assumes that the conversion has been performed
but in reality it has not been done. At run time when a chararray based operation is performed
we see a ClassCastException.

The notion of a schema has is absent in the physical operators. The schema/fieldSchema in
the logical layer has to be passed on to the physical layer. The schema can be used to perform
additional operations like casting, etc.

{code}

grunt> cat bag.data
{(1)}

grunt> a = load 'bag.data' as (b:{t:(c:chararray)});
grunt> b = foreach a generate flatten(b);
grunt> c = foreach b generate CONCAT('Hello ', $0);
grunt> dump c;

2009-01-12 10:44:44,417 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2009-01-12 10:45:09,439 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Map reduce job failed
2009-01-12 10:45:09,440 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Job failed!
2009-01-12 10:45:09,443 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (map) task_200812151518_9681_m_000000java.lang.ClassCastException:
java.lang.Integer cannot be cast to java.lang.String
        at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:37)
        at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:31)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:259)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:271)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
...

2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable
to open iterator for alias c

2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException:
Unable to open iterator for alias c
        at org.apache.pig.PigServer.openIterator(PigServer.java:426)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
        at org.apache.pig.Main.main(Main.java:302)

Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:420)
        ... 5 more

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message