pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4536) LIMIT inside nested foreach should have combiner optimization
Date Wed, 06 May 2015 21:19:59 GMT

     [ https://issues.apache.org/jira/browse/PIG-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohini Palaniswamy updated PIG-4536:
------------------------------------
    Labels: Performance  (was: )

Using Combiner for LIMIT in nested foreach, should also consider the case of ORDER BY followed
by LIMIT.

group_result = FOREACH data_group
{ 
B = ORDER A by f3 asc;
C = LIMIT A.f3 1; 
GENERATE group, A.f3 };

 Combiner should do sorting in this case before applying the limit and that can be built upon
PIG-4449 which will support pushing limit into a sorted bag. 

> LIMIT inside nested foreach should have combiner optimization
> -------------------------------------------------------------
>
>                 Key: PIG-4536
>                 URL: https://issues.apache.org/jira/browse/PIG-4536
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>              Labels: Performance
>
> data_group = GROUP A BY (f1, f2) PARALLEL 100;
> group_result = FOREACH data_group {
> B = LIMIT A.f3 1;
> GENERATE group,  
> SUM(A.f3),
> SUM(A.f4),
> SUM(A.f5),
> SUM(A.f6),
> FLATTEN(B);
> };
> A script like this has combiner optimization turned off and so consumes a lot of memory
and is slow. We should implement LIMIT using Combiner in cases like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message