hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism
Date Fri, 24 Sep 2010 21:44:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914663#action_12914663
] 

Thejas M Nair commented on PIG-1642:
------------------------------------

Comments on the patch -
- In SampleOptimizer.java It expects the sampling MR plan to have only one integer argument
which has information about the number of reducers that will be used in the successor of sampling
job (order-by/skewed-join). We might not remember this assumption if we make changes to the
sampling plan, so it will be safer to throw an error if more than one integer constant is
seen in the plan.
- In test case, the expected number of reducers is being computed dynamically and used for
checking in first scenario, it can be used it in last scenario as well.


> Order by doesn't use estimation to determine the parallelism
> ------------------------------------------------------------
>
>                 Key: PIG-1642
>                 URL: https://issues.apache.org/jira/browse/PIG-1642
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers if it isn't
specified (via PARALLEL or default_parallel). For order by statement, however, it still defaults
to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message