pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olston <ols...@yahoo-inc.com>
Subject Re: [jira] Updated: (PIG-7) Optimize execution of algebraic functions
Date Thu, 29 Nov 2007 21:32:41 GMT
Awesome!!

On Nov 29, 2007, at 12:25 PM, Alan Gates (JIRA) wrote:

>
>      [ https://issues.apache.org/jira/browse/PIG-7? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Alan Gates updated PIG-7:
> -------------------------
>
>     Patch Info: [Patch Available]
>
> Attaching patch that implements use of combiner for algebraic  
> functions in limited situations.  Algebraic is only applied when  
> all functions to be evaluated in a given generate line are  
> algebraic and when there is one and only one relation being grouped  
> (ie it is not applied in cogroup situations).
>
> Initial, very simple, performance tests show a speed up of ~40%  
> (13m -> 7.5m for 4G on 10 machines) with the following script:
> a = load '/user/pig/tests/data/perf/studenttab200M';
> b = group a by $0;
> c = foreach b generate group, COUNT($1), SUM($1.$2), AVG($1.$2), MIN 
> ($1.$1), MAX($1.$2);
> store c into 'bla';
>
>> Optimize execution of algebraic functions
>> -----------------------------------------
>>
>>                 Key: PIG-7
>>                 URL: https://issues.apache.org/jira/browse/PIG-7
>>             Project: Pig
>>          Issue Type: Improvement
>>          Components: impl
>>            Reporter: Olga Natkovich
>>            Assignee: Alan Gates
>>         Attachments: combiner.patch
>>
>>
>> Algebraic are functions that can be computed incrementally like  
>> count(X), SUM(X), etc. They can be computed effciently by doing  
>> the first level computation using hadoop combiner. This can give a  
>> significant (2-3x) speedup for many aggregation queries.
>> Several users asked us for this feature so it is pretty high  
>> priority.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message