hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-580) Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach
Date Mon, 29 Dec 2008 19:00:44 GMT
Combiner should also be used when there are distinct aggregates in a foreach following a group
provided there are no non-algebraics in the foreach 
---------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-580
                 URL: https://issues.apache.org/jira/browse/PIG-580
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch


Currently Pig uses the combiner only when there is foreach following a group when the elements
in the foreach generate have the following characteristics:
1) simple project of the "group" column
2) Algebraic UDF

The above conditions exclude use of the combiner for distinct aggregates - the distinct operation
itself is combinable (irrespective of whether it feeds to an algebraic or non algebraic udf).
So if the following foreach should also be combinable:
{code}
..
b = group a by $0;
c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
{code}

The combiner optimizer should cause the distinct to be combined and the final combine output
should feed the COUNT() and SUM() in the reduce.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message