pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4410) Fix testRankWithEmptyReduce in tez mode
Date Wed, 04 Feb 2015 21:48:35 GMT

    [ https://issues.apache.org/jira/browse/PIG-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306031#comment-14306031
] 

Rohini Palaniswamy commented on PIG-4410:
-----------------------------------------

+1

> Fix testRankWithEmptyReduce in tez mode
> ---------------------------------------
>
>                 Key: PIG-4410
>                 URL: https://issues.apache.org/jira/browse/PIG-4410
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.15.0
>
>         Attachments: PIG-4410-1.patch
>
>
> testRankWithEmptyReduce added in PIG-4392 failed in tez mode. The reason is POReservoirSample
produce more sample than necessary. In particular, if the input of the vertex is empty, it
produces a fake tuple which does not have the original data, but a marked field plus 0 rowNum.
That cause the WeightedRangePartitioner fail:
> {code}
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
> 	at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:115)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPigNullableWritable(WeightedRangePartitioner.java:192)
> {code}
> Another issue I found is GetMemNumRows, I erroneously add the size of mark tuple, which
make the size estimation inaccurate. I put the fix in the same patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message