hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich" <ol...@yahoo-inc.com>
Subject RE: The plan generated for this nested plan is not as per we had discussed
Date Mon, 30 Jun 2008 18:32:26 GMT
Does this mean that distinct and filter will be recomputed several
times?

Olga 

> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com] 
> Sent: Monday, June 30, 2008 11:21 AM
> To: Shravan Narayanamurthy
> Cc: Santhosh Srinivasan; pig-dev@incubator.apache.org
> Subject: Re: The plan generated for this nested plan is not 
> as per we had discussed
> 
> Analysis below.
> 
> Shravan M Narayanamurthy wrote:
> > Hi Guys,
> > I think we need to find a proper set of rules for the project's 
> > schema. The following script kinda of covers all the scenarios:
> > A = load 'a';
> > B = group A by $0;
> > C = foreach B {
> > C1 = filter A by $0>5;
> > C2 = distinct C1;
> > C3 = distinct A;
> > generate group, udf1(*), udf2(C2), udf3(C2.$1), udf4(C3), 
> udf(C3.$1); 
> > }
> >
> > I think, we had not thought about the projection in the 
> inner plan of 
> > filter. With this constraint, we need a new set of rules. 
> Can you post 
> > an algorithm that will work to set the return types of the projects?
> >
> > Thanks & Regards,
> > --Shravan
> >
> > <snip>
> In this case, the foreach should have the following plans:
> 
> 0 - proj(0)
> 
> 1 - proj( * ) -> udf1
> 
> 2 - proj (1) -> filter -> distinct -> proj( * ) -> udf2
> 
> 3 - proj (1) -> filter -> distinct -> proj(1) -> udf3
> 
> 4 - proj(1) -> distinct -> proj( * ) -> udf4
> 
> 5 - proj(1) -> distinct -> proj(1) -> udf5
> 
> In plans 2 and 3, filter will have an inner plan of:
> 
> proj(0) -> gt, const(5) -> gt
> 
> In discussing the scenario, Santhosh and I saw one issue, 
> which is that in plan 1, the proj( * ) will be incorrectly 
> trying to accumulate a bag for udf1, when it should just pass 
> the tuple.  Santhosh is going to fix that by changing the 
> project to determine whether it has a predecessor, and if so 
> whether that predecessor is a relational operator, instead of 
> looking at its input to see if it's a relational operator.
> 
> I didn't follow your comment on the issue with the project in 
> the filter plan.  It looked fine to me.
> 
> Alan.
> 

Mime
View raw message