hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-479) PERFORMANCE: more extensive use of the combier
Date Thu, 13 May 2010 02:01:47 GMT

    [ https://issues.apache.org/jira/browse/PIG-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866941#action_12866941
] 

Scott Carey commented on PIG-479:
---------------------------------

Not just useful.... this should probably be a test to add to PigMix -- it happens all the
time for me.  From my perspective, PigMix needs an overhaul with more 'real world' queries
ripe for optimizations, many of the optimizations in the system that made PigMix fast seem
to fail to apply in many real world situations.  The combiner is a good example.

I have many opportunities for combiner usage (0.5, 0.6) that are not taken advantage of. 


For example, I'm not quite sure why this one doesn't use a combiner -- it reads ~350x as much
input bytes from HDFS as its reduce output, a combiner would be very effective:

J = COGROUP
    UV BY (s, d, h, g, p, pa, st) OUTER,
    UC BY (s, d, h, g, p, pa, st) OUTER,
    AT BY (s, d, h, g, p, pa, st) OUTER,
    V BY (s, d, h, g, p, pa, st) OUTER,
    C BY (s, d, h, g, p, pa, st) OUTER;

OUTPUT = FOREACH J GENERATE 
    FLATTEN(group) as (s, d, h, g, p, pa, st), 
    COUNT_STAR(C) as c,
    COUNT_STAR(V) as v, 
    SUM(AT.p1) as p1,
    SUM(AT.p2) as p2,
    SUM(AT.p3) as p3,
    SUM(UC.q) as ucq,
    SUM(UC.r) as ucr,
    SUM(UV.q) as uvq,
    SUM(UV.r) as uvr;

> PERFORMANCE: more extensive use of the combier
> ----------------------------------------------
>
>                 Key: PIG-479
>                 URL: https://issues.apache.org/jira/browse/PIG-479
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>
>  On types branch, the combiner is used anytime a foreach includes only simple projections
and/or algebraic functions.  It would also be useful to invoke the combiner in cases where
algebraic and non-algebraic operations are mixed, or where expression evaluation is included
in the foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message