hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-364) Limit return incorrect records when we use multiple reducer
Date Thu, 11 Sep 2008 05:32:45 GMT

     [ https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-364:
---------------------------

    Attachment: PIG-364.patch

This patch takes approach 1. It will add one additional map-reduce operator with 1 reducer
if the requested parallelism > 1. Now the behavior of limit is:

1. If the map plan is closed before POLimit operator, we put POLimit in reduce plan, grant
requested parallelism, if requested parallelism > 1, close reduce plan, add one additional
map-reduce operator with 1 reducer

2. If the map plan is open before POLimit operator, we put POLimit in map plan, close map
plan, add another POLimit to reduce plan, and set parallelism of this map-reduce operator
1. Although in this case, POLimit create a map-reduce boundary, we do not associate a parallel
option with limit keyword. I believe provide a parallel option with limit will arouse confusion
to the user, because it is relatively hard to explain to the user whether this parallel option
will be granted or not

3. In limited sort case, we will have POSort with limit<>-1. If the parallelism for
POSort > 1, we add one additional map-reduce operator with 1 reducer


> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
>                 Key: PIG-364
>                 URL: https://issues.apache.org/jira/browse/PIG-364
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: types_branch
>
>         Attachments: PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case of n reducer,
we will get up to n*k output. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message