hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-364) Limit return incorrect records when we use multiple reducer
Date Fri, 08 Aug 2008 00:58:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620799#action_12620799

Daniel Dai commented on PIG-364:

Seems no perfect solution. Here are three possible treatments:
1. If there is a limit in reducer, and number of reducer > 1, add another map-reduce after
that with only 1 reducer
    Cons: extra-overhead
2. Instead of map-reduce, manupilate output file directly, keep top k in output file
    Cons: not orthodox, extra-overhead (but not as much as 1)
3. If there is a limit in reducer, change the parallel degree of the reducer to 1
    Cons: can not take advantage of parallel processing for reducer

> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>                 Key: PIG-364
>                 URL: https://issues.apache.org/jira/browse/PIG-364
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: types_branch
> Currently we put Limit(k) operator in the reducer plan. However, in the case of n reducer,
we will get up to n*k output. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message