hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: algebraic optimization not invoked for filter following group?
Date Tue, 15 Jun 2010 15:10:23 GMT
For at least simple cases what's in the pseduo code should work.  I  
hope someday soon we can start using the new logical optimizer work  
(in the experimental package) to build rules for the MR optimizer  
(like this combiner stuff) as well, which should be much easier to  
code.  But it will be a while before we get there.

I don't think this will automatically make it work for split, because  
I think it will see the split in the plan and that will make it choose  
not to optimize.

Alan.

On Jun 2, 2010, at 4:18 PM, Dmitriy Ryaboy wrote:

> It looks like right now, the combiner optimization does not kick in  
> for a
> script like this:
>
> data = load 'foo' using PigStorage() as (a, b, c);
> grouped = group data by a;
> filtered = filter grouped by COUNT(data) < 1000;
>
> Looking at the code in CombinerOptimizer, seems like the Filter bit  
> is just
> pseudo-coded in comments. Are there complications there other than  
> what is
> already noted, or is it just the matter of coding up the pseudo-code?
>
> On that note -- assuming the optimization was implemented for Filter
> following group, would it automagically start working for Splits, as  
> well?
>
> -D


Mime
View raw message