hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-580) PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach
Date Mon, 05 Jan 2009 22:17:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660953#action_12660953
] 

Pradeep Kamath commented on PIG-580:
------------------------------------

A different AlgebraicChecker instance is used for each ForEach inner plan. So the above check
is to guard against more than one distinct agg in the same inner plan. In the script above,
the two distinct aggs would be present in two different inner plans of the ForEach and the
AlgebraicChecker instance dealing with COUNT(Ab) would mark it as "combineable" as would the
(different) AlgebraicChecker instance working with COUNT(Bb). So the script would use the
combiner.

> PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach
following a group provided there are no non-algebraics in the foreach 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-580
>                 URL: https://issues.apache.org/jira/browse/PIG-580
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-580-v2.patch, PIG-580.patch
>
>
> Currently Pig uses the combiner only when there is foreach following a group when the
elements in the foreach generate have the following characteristics:
> 1) simple project of the "group" column
> 2) Algebraic UDF
> The above conditions exclude use of the combiner for distinct aggregates - the distinct
operation itself is combinable (irrespective of whether it feeds to an algebraic or non algebraic
udf). So if the following foreach should also be combinable:
> {code}
> ..
> b = group a by $0;
> c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
> {code}
> The combiner optimizer should cause the distinct to be combined and the final combine
output should feed the COUNT() and SUM() in the reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message