hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-880) Order by is borken with complex fields
Date Fri, 10 Jul 2009 23:21:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729882#action_12729882
] 

Pradeep Kamath commented on PIG-880:
------------------------------------

The root cause of this issue is that in interpreting map data, PigStorage returns values in
the map to be of the type that it deduces based on the data. So string data for values are
returned as String, integer values are returned as Integer. However the logical layer in Pig
assumes the type of the values in the map to be ByteArray since it cannot assume any type.
If one of the sampled values forming the quantile list is a null, it is assumed to be of type
of the reduce key of the final order by job. In this case, since the order by key is smap#'name',
it is thought to be of type ByteArray. However the values resulting from the map lookup are
actually of type String.  This mismatch results in the above exception - if nulls are filtered
out, map.collect() fails because hadoop thinks the map key type is bytearray but it gets a
Text (string).

A proposal to fix this is to Change TextDataParser which is used by PigStorage for reading
map data to return ByteArray type for the values in the map.

Thoughts?



> Order by is borken with complex fields
> --------------------------------------
>
>                 Key: PIG-880
>                 URL: https://issues.apache.org/jira/browse/PIG-880
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Olga Natkovich
>             Fix For: 0.4.0
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;            
> s = order f by $0;           
> store s into 'sc.out'         
> Stack:
> Caused by: java.lang.ArrayStoreException
>         at java.lang.System.arraycopy(Native Method)
>         at java.util.Arrays.copyOf(Arrays.java:2763)
>         at java.util.ArrayList.toArray(ArrayList.java:305)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
>         ... 5 more
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
>         at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
>         at org.apache.pig.PigServer.execute(PigServer.java:762)
>         at org.apache.pig.PigServer.access$100(PigServer.java:91)
>         at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
>         at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
>         at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message