db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Klebanoff <klebanoff-de...@sbcglobal.net>
Subject Re: INTERSECT and EXCEPT Design
Date Mon, 28 Feb 2005 18:08:33 GMT
RPost wrote:

> >Jack Klebanoff wrote:
>  
> >The syntax is (roughly):
>
> />queryExpression/ INTERSECT [ALL] /queryExpression/
> />queryExpression/ EXCEPT [ALL] /queryExpression/
>
> Although DISTINCT is implicit if ALL is not specified the SQL standard 
> also allows the DISTINCT operator to be specified explicitly. Would it 
> make sense to do that in Derby also?
>
>      INTERSECT [ALL | DISTINCT] and EXCEPT [ALL | DISTINCT]
>
Other relational databases do not all support the "DISTINCT" keyword, so 
applications that use it will not be portable.

That being said, I think that adding the "DISTINCT" keyword would be a 
good project for a neophyte who wants to get his or her feet wet in the 
Derby parser.

> > The architecture of the Derby optimizer makes it difficult to do 
> further optimizations.
>
> Yikes! Tread lightly there hoss, I think he (Jeffrey Lichtman) 
> is listening now.
>
His mail has been helpful in the past. I hope that I can stimulate more.

I don't think that we should spend a lot of effort on optimizing the 
INTERSECT and EXCEPT operators. It isn't used often and the performance 
of the current implementation is always at least decent. We should put 
our optimization efforts into optimizing the performance of other 
operations.

(Please speak up if you think that INTERSECT or EXCEPT will be used often).

> >The UNION and EXCEPT operators have the same precedence. The 
> INTERSECT operator has higher >precedence, so
>
> >t1 UNION ALL t2 UNION t3
>
> > is equivalent to
>
> >(t1 UNION ALL t2) UNION t3
>
> Yes but it is not equivalent to
>
> t1 UNION ALL (t2 UNION t3)
>
> Anyway, I think you meant to compare UNION (or UNION ALL) and 
> INTERSECT here (as you did EXCEPT/INTERSECT below this) to illustrate 
> the higher precedence of INTERSECT.
>
Yes. I included the UNION/UNION ALL example because it is important to 
get the associativity right even in the UNION case. I have updated my 
paper (attached) to say so explicitly, and include an example with UNION 
and EXCEPT.

> >IntersectOrExceptNode uses the ORDER BY columns as the most 
> significant part of the sort key for its inputs. Any >columns not in 
> the ORDER BY list are tacked on to the least significant part of the 
> sort keys of the inputs. This >ensures that the output of the 
> INTERSECT or EXCEPT will be properly ordered without an additional 
> sort step
>
> You sly dog, you.
>
> Is any check made for primary keys or unique indexes on any of the 
> select columns for each of the tables?
>
No. It isn't very easy for the set operators to find out the structure 
of the operands or to deduce whether there may be duplicate rows in 
either of the inputs. If the IntersectOrExceptNode class could know that 
one of the operands would use a unique index it could use that column in 
its sort key, avoiding a sort. However the structure of the optimizer 
requires that orderby lists be generated in the optimization preprocess 
phase, before the main body of the optimizer has started. So it is hard 
to use knowledge about unique keys if we could get it.

Jack

Mime
View raw message