Attached is a patch that implements the SQL INTERSECT and EXCEPT
operators. The INTERSECT operator constructs the intersection of two
tables. The EXCEPT operator finds all rows in one table but not in the
other. The syntax is (roughly):
<query expression> INTERSECT [ALL] <query expression>
<query expression> EXCEPT [ALL] <query expression>
By default these operators remove duplicates, which can occur if there
are duplicates in the inputs. If ALL is specified then duplicates are
not returned. If t1 has m copies of row R and t2 has n copies then t1
INTERSECT ALL t2 returns min(m,n) copies of R, and t1 EXCEPT ALL t2
returns max( 0, mn) copies of R.
The EXCEPT operator has the same precedence as UNION. INTERSECT has
higher precedence.
This follows the SQL92 spec. (At least it follows my understanding of
the spec. Spec lawyers are invited to comment).
The implementation uses sorting. The two input tables are sorted and
then scanned together. The appropriate rows from the left input are output.
The compiler binds INTERSECT and EXCEPT like UNION. Therefore a new
class, org.apache.derby.impl.sql.compile.SetOperatorNode, was carved out
of UnionNode. It mainly contains bind methods. Classes UnionNode and
IntersectOrExceptNode extend SetOperatorNode. Classes IntersectNode and
ExceptNode extend IntersectOrExceptNode. IntersectOrExceptNode does most
of the optimization and code generation work. It puts OrderBy nodes in
front of its inputs.
The generated code creates a SetOpProjectRestrictResultSet that reads
its sorted inputs to produce the required output table.
Jack Klebanoff
