+This paper describes the implementation of the INTERSECT and EXCEPT operators. This paper assumes +basic familiarity with SQL and the language (compiler) portion of Derby. +
+The INTERSECT and EXCEPT operators operate on +table expressions producing the intersection and difference, respectively. The syntax is (roughly): +
+queryExpression INTERSECT [ALL] queryExpression
+queryExpression EXCEPT [ALL] queryExpression
+
+By default these operators remove duplicates, which can occur if there are duplicates in the +inputs. If ALL is specified then duplicates are not removed. If t1 has m copies of row R and t2 has +n copies then t1 INTERSECT ALL t2 returns min(m,n) copies of R, and t1 EXCEPT ALL t2 returns max( 0, +m-n) copies of R. +
+The EXCEPT operator has the same precedence as UNION. INTERSECT has higher precedence. +
+The implementation is spread across several classes, primarily +
+If the left and right inputs have N and M rows respectively the sorts take time O(N*log(N) + +M*log(M)). The final scan takes time O(N + M). So the time for the whole operation is O(N*log(N) + +M*log(M)). +
+Other implementations are possible. +
+Hash tables were rejected because, when the INTERSECT and EXCEPT operations were implemeted, +BackingStoreHashtable did not spill to disk. A hash table implementation could exhaust memory. +
+The current implementation was chosen because it always provides at least decent speed +and memory utilization, and in many, though certainly not all cases, it is the best implementation. +
+We could +have provided several implementations and let the optimizer choose the best, but this does not seem +worthwhile for operations that are seldom used. +
+The INTERSECT and EXCEPT operators are bound much like the UNION operator. The bind methods are all +found in super class SetOperatorNode, which is shared with UnionNode. +
+IntersectOrExceptNode generates OrderByNodes for its inputs at the start of optimization, in the +preprocess phase. Any column ordering can be used for the sorts, as long as the same one is used for +both the left and right inputs. IntersectOrExceptNode tries to be a little clever about picking the +column ordering for the sorts. If the INTERSECT or EXCEPT output must be ordered then +IntersectOrExceptNode uses the ORDER BY columns as the most significant part of the sort key for its +inputs. Any columns not in the ORDER BY list are tacked on to the least significant part of +the sort keys of the inputs. This ensures that the output of the INTERSECT or EXCEPT will be +properly ordered without an additional sort step. +
+The architecture of the Derby optimizer makes it difficult to do further optimizations. SelectNode +processing requires that order by lists be pushed down to them at the start of preprocessing. If an +input to INTERSECT or EXCEPT is a SELECT (a common case) then IntersectOrExceptNode has to decide +whether it needs its inputs ordered before it calls the preprocess method of its inputs. That means +that it must chose its execution plan at the start of the optimization process, not as the result of +the optimization process. +
+Code generation is straighforward. It generates code that invokes the ResultSetFactory.getSetOpResultSet method. +
+The UNION and EXCEPT operators have the same precedence. The INTERSECT operator has higher +precedence, so +
+t1 EXCEPT t2 UNION t3 ++is equivalent to +
+(t1 EXCEPT t2) UNION t3 ++and +
+t1 UNION ALL t2 UNION t3 ++is equivalent to +
+(t1 UNION ALL t2) UNION t3 ++while +
+t1 EXCEPT t2 INTERSECT t3 ++is equivalent to +
+t1 EXCEPT (t2 INTERSECT t3) ++
+Note that the EXCEPT operator is not associative, nor is UNION ALL +associative with UNION, so the correct associativity is important, even when the query expression +uses the same operator. +
+t1 EXCEPT (t2 EXCEPT t3) ++is not equivalant to +
+(t1 EXCEPT t2) EXCEPT t3 ++
+This precedence and associativity is implemented in the structure of queryExpression. The higher +precedence of the INTERSECT operator is implemented by building queryExpression out of +nonJoinQueryTerm. A nonJoinQueryTerm consists of one or more nonJoinQueryPrimaries connected by +INTERSECT operators. A queryExpression consists of one or more nonJoinQueryTerms connected by EXCEPT +or UNION operators. +
+Our parser is a recursive descent parser. Recursive descent parsers want recursion on the
+right. Therefore they do not handle SQL's operator associativity naturally. The natural grammar for
+queryExpression is
+
+queryExpression ::= + nonJoinQueryTerm | + queryExpression UNION [ALL] nonJoinQueryTerm | + queryExpression EXCEPT [ALL] nonJoinQueryTerm ++This captures the correct associativity of the UNION and EXCEPT operators, but we cannot use it +because it has recursion on the left. Therefore our parser uses the following grammar: +
+
+ResultSetNode
+queryExpression(ResultSetNode leftSide, int operatorType) throws StandardException :
+{
+ ResultSetNode term;
+}
+{
+ term = nonJoinQueryTerm(leftSide, operatorType) [ term = unionOrExcept(term) ]
+ {
+ return term;
+ }
+}
+ResultSetNode
+unionOrExcept(ResultSetNode term) throws StandardException :
+{
+ ResultSetNode expression;
+ Token tok = null;
+}
+{
+ [ tok = ]
+ expression = queryExpression(term, (tok != null) ? UNION_ALL_OP : UNION_OP)
+ {
+ return expression;
+ }
+|
+ [ tok = ]
+ expression = queryExpression(term, (tok != null) ? EXCEPT_ALL_OP : EXCEPT_OP)
+ {
+ return expression;
+ }
+}
+
+
+We have right recursion, but each queryExpression has to know whether it is a simple expression or
+the right hand side of a UNION or INTERSECT operation. The nonJoinQueryTerm method uses a similar
+trick to implement SQL standard associativity while using right recursion. INTERSECT associativity matters
+when INTERSECT ALL is mixed with plain INTERSECT.
++The creation of union, intersect, or except +query tree nodes is done in the nonJoinQueryTerm method. + +
+[Back to Derby Papers] +
+ Propchange: incubator/derby/site/trunk/build/site/papers/Intersect-design.html ------------------------------------------------------------------------------ svn:eol-style = native Modified: incubator/derby/site/trunk/build/site/papers/JDBCImplementation.html URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/build/site/papers/JDBCImplementation.html?view=diff&r1=155704&r2=155705 ============================================================================== --- incubator/derby/site/trunk/build/site/papers/JDBCImplementation.html (original) +++ incubator/derby/site/trunk/build/site/papers/JDBCImplementation.html Mon Feb 28 13:56:34 2005 @@ -128,6 +128,9 @@ +@@ -301,7 +308,7 @@
@@ -374,7 +381,7 @@
-Last Updated: February 11, 2005 +Last Updated: February 28, 2005
+This paper describes the implementation of the INTERSECT and EXCEPT operators. This paper assumes +basic familiarity with SQL and the language (compiler) portion of Derby. +
+The INTERSECT and EXCEPT operators operate on +table expressions producing the intersection and difference, respectively. The syntax is (roughly): +
+queryExpression INTERSECT [ALL] queryExpression
+queryExpression EXCEPT [ALL] queryExpression
+
+By default these operators remove duplicates, which can occur if there are duplicates in the +inputs. If ALL is specified then duplicates are not removed. If t1 has m copies of row R and t2 has +n copies then t1 INTERSECT ALL t2 returns min(m,n) copies of R, and t1 EXCEPT ALL t2 returns max( 0, +m-n) copies of R. +
+The EXCEPT operator has the same precedence as UNION. INTERSECT has higher precedence. +
+The implementation is spread across several classes, primarily +
+If the left and right inputs have N and M rows respectively the sorts take time O(N*log(N) + +M*log(M)). The final scan takes time O(N + M). So the time for the whole operation is O(N*log(N) + +M*log(M)). +
+Other implementations are possible. +
+Hash tables were rejected because, when the INTERSECT and EXCEPT operations were implemeted, +BackingStoreHashtable did not spill to disk. A hash table implementation could exhaust memory. +
+The current implementation was chosen because it always provides at least decent speed +and memory utilization, and in many, though certainly not all cases, it is the best implementation. +
+We could +have provided several implementations and let the optimizer choose the best, but this does not seem +worthwhile for operations that are seldom used. +
+The INTERSECT and EXCEPT operators are bound much like the UNION operator. The bind methods are all +found in super class SetOperatorNode, which is shared with UnionNode. +
+IntersectOrExceptNode generates OrderByNodes for its inputs at the start of optimization, in the +preprocess phase. Any column ordering can be used for the sorts, as long as the same one is used for +both the left and right inputs. IntersectOrExceptNode tries to be a little clever about picking the +column ordering for the sorts. If the INTERSECT or EXCEPT output must be ordered then +IntersectOrExceptNode uses the ORDER BY columns as the most significant part of the sort key for its +inputs. Any columns not in the ORDER BY list are tacked on to the least significant part of +the sort keys of the inputs. This ensures that the output of the INTERSECT or EXCEPT will be +properly ordered without an additional sort step. +
+The architecture of the Derby optimizer makes it difficult to do further optimizations. SelectNode +processing requires that order by lists be pushed down to them at the start of preprocessing. If an +input to INTERSECT or EXCEPT is a SELECT (a common case) then IntersectOrExceptNode has to decide +whether it needs its inputs ordered before it calls the preprocess method of its inputs. That means +that it must chose its execution plan at the start of the optimization process, not as the result of +the optimization process. +
+Code generation is straighforward. It generates code that invokes the ResultSetFactory.getSetOpResultSet method. +
+The UNION and EXCEPT operators have the same precedence. The INTERSECT operator has higher +precedence, so +
+t1 EXCEPT t2 UNION t3 ++is equivalent to +
+(t1 EXCEPT t2) UNION t3 ++and +
+t1 UNION ALL t2 UNION t3 ++is equivalent to +
+(t1 UNION ALL t2) UNION t3 ++while +
+t1 EXCEPT t2 INTERSECT t3 ++is equivalent to +
+t1 EXCEPT (t2 INTERSECT t3) ++
+Note that the EXCEPT operator is not associative, nor is UNION ALL +associative with UNION, so the correct associativity is important, even when the query expression +uses the same operator. +
+t1 EXCEPT (t2 EXCEPT t3) ++is not equivalant to +
+(t1 EXCEPT t2) EXCEPT t3 ++
+This precedence and associativity is implemented in the structure of queryExpression. The higher +precedence of the INTERSECT operator is implemented by building queryExpression out of +nonJoinQueryTerm. A nonJoinQueryTerm consists of one or more nonJoinQueryPrimaries connected by +INTERSECT operators. A queryExpression consists of one or more nonJoinQueryTerms connected by EXCEPT +or UNION operators. +
+Our parser is a recursive descent parser. Recursive descent parsers want recursion on the
+right. Therefore they do not handle SQL's operator associativity naturally. The natural grammar for
+queryExpression is
+
+queryExpression ::= + nonJoinQueryTerm | + queryExpression UNION [ALL] nonJoinQueryTerm | + queryExpression EXCEPT [ALL] nonJoinQueryTerm ++This captures the correct associativity of the UNION and EXCEPT operators, but we cannot use it +because it has recursion on the left. Therefore our parser uses the following grammar: +
+
+ResultSetNode
+queryExpression(ResultSetNode leftSide, int operatorType) throws StandardException :
+{
+ ResultSetNode term;
+}
+{
+ term = nonJoinQueryTerm(leftSide, operatorType) [ term = unionOrExcept(term) ]
+ {
+ return term;
+ }
+}
+ResultSetNode
+unionOrExcept(ResultSetNode term) throws StandardException :
+{
+ ResultSetNode expression;
+ Token tok = null;
+}
+{
+ [ tok = ]
+ expression = queryExpression(term, (tok != null) ? UNION_ALL_OP : UNION_OP)
+ {
+ return expression;
+ }
+|
+ [ tok = ]
+ expression = queryExpression(term, (tok != null) ? EXCEPT_ALL_OP : EXCEPT_OP)
+ {
+ return expression;
+ }
+}
+
+
+We have right recursion, but each queryExpression has to know whether it is a simple expression or
+the right hand side of a UNION or INTERSECT operation. The nonJoinQueryTerm method uses a similar
+trick to implement SQL standard associativity while using right recursion. INTERSECT associativity matters
+when INTERSECT ALL is mixed with plain INTERSECT.
++The creation of union, intersect, or except +query tree nodes is done in the nonJoinQueryTerm method. + +
+[Back to Derby Papers] +
+ Propchange: incubator/derby/site/trunk/src/documentation/content/papers/Intersect-design.html ------------------------------------------------------------------------------ svn:eol-style = native Modified: incubator/derby/site/trunk/src/documentation/content/xdocs/papers/index.xml URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/src/documentation/content/xdocs/papers/index.xml?view=diff&r1=155704&r2=155705 ============================================================================== --- incubator/derby/site/trunk/src/documentation/content/xdocs/papers/index.xml (original) +++ incubator/derby/site/trunk/src/documentation/content/xdocs/papers/index.xml Mon Feb 28 13:56:34 2005 @@ -20,6 +20,8 @@ OverviewLast Updated: February 11, 2005
+Last Updated: February 28, 2005
Modified: incubator/derby/site/trunk/src/documentation/content/xdocs/site.xml URL: http://svn.apache.org/viewcvs/incubator/derby/site/trunk/src/documentation/content/xdocs/site.xml?view=diff&r1=155704&r2=155705 ============================================================================== --- incubator/derby/site/trunk/src/documentation/content/xdocs/site.xml (original) +++ incubator/derby/site/trunk/src/documentation/content/xdocs/site.xml Mon Feb 28 13:56:34 2005 @@ -24,12 +24,13 @@