pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-2237) LIMIT generates wrong number of records if pig determines no of reducers as more than 1
Date Wed, 21 Sep 2011 20:51:09 GMT

     [ https://issues.apache.org/jira/browse/PIG-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-2237:
----------------------------

    Attachment: PIG-2237-4.patch

PIG-2237-4.patch address Dmitriy's review comment.

> LIMIT generates wrong number of records if pig determines no of reducers as more than
1
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-2237
>                 URL: https://issues.apache.org/jira/browse/PIG-2237
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>             Fix For: 0.9.1, 0.10
>
>         Attachments: PIG-2237-1.patch, PIG-2237-2.patch, PIG-2237-3.patch, PIG-2237-4.patch
>
>
> Hi,
> For a script
> ========
> A = load 'test.txt' using PigStorage() as (a:int,b:int);
> B = order A by a ;
> C = limit B 2;
> store C into 'op1' using PigStorage();
> ========
> Limit and ORDER BY are done in the same MR job if no explicit PARALLELism is mentioned.
> In this case, the no of reducers are determined by pig and sometimes it is calculated
> 1.
> Since limit happens at the reduce side, each reduce tasks does a limit separately generating
n*2 records where n is the no of reduce tasks calculated by pig.
> If an explicit specification of no of reduce tasks using PARALLEL keyword is done on
ORDER BY,
> ==========
> B = order A by a PARALLEL 4;
> ==========
> another MR is created with 1 reduce task where the limit is done. 
> In short, the issue occurs when the no of reducers calculated by pig is greater than
1 and a limit is involved in the MR.
> The issue can be replicated by specifying
> ==========
> -Dpig.exec.reducers.bytes.per.reducer
> ==========
> The issue is seen in 0.8 and 0.9 version. It works good in 0.7
> Regards,
> Anitha

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message