hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's
Date Fri, 06 Nov 2009 00:27:32 GMT

    [ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774153#action_12774153
] 

Santhosh Srinivasan commented on PIG-1065:
------------------------------------------

Answer to Question 1: Pig 1.0 had that syntax and it was retained for backward compatibility.
Paolo suggested that for uniformity, the 'AS' clause for the load statements should be extended
to all relational operators. Gradually, the column aliasing in the foreach should be removed
from the documentation and eventually removed from the language.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> ------------------------------------------------------------------------
>
>                 Key: PIG-1065
>                 URL: https://issues.apache.org/jira/browse/PIG-1065
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a ORDER BY of
this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> ====================
> Schema for u0 unknown.
> ====================
> (1,2)
> (2,3)
> (1)
> (2)
> ====================
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator
for alias u1
>         at org.apache.pig.PigServer.openIterator(PigServer.java:475)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>         at org.apache.pig.Main.main(Main.java:397)
> ====================
> Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable,
recieved org.apache.pig.impl.io.NullableText
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> ====================
> When I run the same script in local mode I get a different result, as we know that local
mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> ====================
> Schema for u0 unknown
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that this is not
allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message