RE: The plan generated for this nested plan is not as per we had discussed
Mon, 30 Jun 2008 18:32:26 GMT
```Does this mean that distinct and filter will be recomputed several
times?

Olga

> Analysis below.
>
> Shravan M Narayanamurthy wrote:
> > Hi Guys,
> > I think we need to find a proper set of rules for the project's
> > schema. The following script kinda of covers all the scenarios:
> > A = load 'a';
> > B = group A by \$0;
> > C = foreach B {
> > C1 = filter A by \$0>5;
> > C2 = distinct C1;
> > C3 = distinct A;
> > generate group, udf1(*), udf2(C2), udf3(C2.\$1), udf4(C3),
> udf(C3.\$1);
> > }
> >
> > I think, we had not thought about the projection in the
> inner plan of
> > filter. With this constraint, we need a new set of rules.
> Can you post
> > an algorithm that will work to set the return types of the projects?
> >
> > Thanks & Regards,
> > --Shravan
> >
> > <snip>
> In this case, the foreach should have the following plans:
>
> 0 - proj(0)
>
> 1 - proj( * ) -> udf1
>
> 2 - proj (1) -> filter -> distinct -> proj( * ) -> udf2
>
> 3 - proj (1) -> filter -> distinct -> proj(1) -> udf3
>
> 4 - proj(1) -> distinct -> proj( * ) -> udf4
>
> 5 - proj(1) -> distinct -> proj(1) -> udf5
>
> In plans 2 and 3, filter will have an inner plan of:
>
> proj(0) -> gt, const(5) -> gt
>
> In discussing the scenario, Santhosh and I saw one issue,
> which is that in plan 1, the proj( * ) will be incorrectly
> trying to accumulate a bag for udf1, when it should just pass
> the tuple.  Santhosh is going to fix that by changing the
> project to determine whether it has a predecessor, and if so
> whether that predecessor is a relational operator, instead of
> looking at its input to see if it's a relational operator.
>
> I didn't follow your comment on the issue with the project in
> the filter plan.  It looked fine to me.
>
> Alan.
>

```
