pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4392) RANK BY fails when default_parallel is greater than cardinality of field being ranked by
Date Fri, 30 Jan 2015 22:32:35 GMT

     [ https://issues.apache.org/jira/browse/PIG-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-4392:
----------------------------
    Attachment: PIG-4392-1.patch

Counter value 0 does not show up in the counter list. So for empty reduce, we didn't have
the counter. Shall pad 0 for this case.

> RANK BY fails when default_parallel is greater than cardinality of field being ranked
by
> ----------------------------------------------------------------------------------------
>
>                 Key: PIG-4392
>                 URL: https://issues.apache.org/jira/browse/PIG-4392
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: Anthony Hsu
>             Fix For: 0.15.0
>
>         Attachments: PIG-4392-1.patch
>
>
> To reproduce:
> {code:title=input.txt}
> 1 2 3
> 4 5 6
> 7 8 9
> {code}
> {code:title=rank.pig}
> set default_parallel 4;
> d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
> e = rank d by a;
> dump e;
> {code}
> If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing RANK BY
has issues if the {{default_parallel}} exceeds the cardinality of the field being ranked by.
> I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied) and Hadoop
2.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message