Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Derby Development" <derby-dev@db.apache.org>
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: svn commit: r513839 - in
 /db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile:
 FromBaseTable.java Predicate.java PredicateList.java
Date: Fri, 02 Mar 2007 16:54:06 -0000
To: derby-commits@db.apache.org
From: abrown@apache.org
Message-Id: <20070302165406.4FAFF1A981A@eris.apache.org>

Author: abrown
Date: Fri Mar  2 08:54:05 2007
New Revision: 513839

URL: http://svn.apache.org/viewvc?view=rev&rev=513839
Log:
DERBY-47 (partial): Update the logic for cost-based optimization (CBO) and
modification of access paths (MoAP) to recognize IN-list "probe predicates"
and to handle them appropriately. More specifically this patch adds code to
do the following:

  - During costing, recognize when we're using a probe predicate as a
    start/stop key and adjust the cost accordingly. This means multiplying
    the estimated cost and row count for "column = ?" by the number of values
    in the IN-list (because we are effectively going to evaluate "column = ?"
    N times, where N is the size of the IN-list, and we could return one or
    more rows for each of the N evaluations).  We also want to make sure that
    the resultant row count estimate is not greater than the total number of
    rows in the table.

  - When determining which predicates can be used as start/stop keys for the
    current conglomerate, only consider a probe predicate to be a start/stop
    key if it applies to the _first_ column in the conglomerate. Otherwise the
    probe predicate would end up being generated as a store qualifier, which
    means we would only get rows for which "column = ?" was true when the
    parameter was set to the _first_ value in the IN-list. That means we would
    end up with incorrect results (missing rows).

  - If cost-based optimization is complete and we are modifying access paths
    in preparation for code generation, then take any probe predicates that are
    *not* going to be used as start/stop keys for the chosen conglomerate and
    "revert" them back to the InListOperatorNodes from which they were built.
    Those InListOpNodes will then be generated as normal IN-list restrictions
    on the rows returned from store. If we did not do this reverting then the
    predicates would ultimately be ignored (since they are not valid qualifiers)
    and we would therefore end up with incorrect results (extra rows).

  - If we're modifying access paths and we have chosen to do multi-probing of
    an index then we disable bulk fetching for the target base table. Logically
    this is not a requirement. However, it turns out that bulk fetch can lead
    to poor performance when multi-probing an index if the number of probe
    values is high (several hundred or more) BUT that number is still just a
    small fraction of the total number of rows in the table.  By disabling
    bulk fetch for multi-probing we avoid this performance slowdown.

Modified:
    db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
    db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java
    db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java?view=diff&rev=513839&r1=513838&r2=513839
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java (original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Fri Mar  2 08:54:05 2007
@@ -1309,6 +1309,11 @@
 			startGap = false;
 			stopGap = false;
 
+			/* If we have a probe predicate that is being used as a start/stop
+			 * key then ssKeySourceInList will hold the InListOperatorNode
+			 * from which the probe predicate was built.
+			 */
+			InListOperatorNode ssKeySourceInList = null;
 			for (int i = 0; i < predListSize; i++)
 			{
 				pred = baseTableRestrictionList.getOptPredicate(i);
@@ -1317,6 +1322,26 @@
 
 				if (startKey || stopKey)
 				{
+					/* A probe predicate is only useful if it can be used as
+					 * as a start/stop key for _first_ column in an index
+					 * (i.e. if the column position is 0).  That said, we only
+					 * allow a single start/stop key per column position in the
+					 * index (see orderUsefulPredicates() in PredicateList.java).
+					 * Those two facts combined mean that we should never have
+					 * more than one probe predicate start/stop key for a given
+					 * conglomerate.
+					 */
+					if (SanityManager.DEBUG)
+					{
+						SanityManager.ASSERT(
+							(ssKeySourceInList == null) ||
+								(((Predicate)pred).getSourceInList() == null),
+							"Found multiple probe predicate start/stop keys" +
+							" for conglomerate '" + cd.getConglomerateName() +
+							"' when at most one was expected.");
+					}
+
+					ssKeySourceInList = ((Predicate)pred).getSourceInList();
 					boolean knownConstant = pred.compareWithKnownConstant(this, true);
 
 					if (startKey)
@@ -1514,6 +1539,37 @@
 				}
 			}
 
+			/* If the start and stop key came from an IN-list "probe predicate"
+			 * then we need to adjust the cost estimate.  The probe predicate
+			 * is of the form "col = ?" and we currently have the estimated
+			 * cost of probing the index a single time for "?".  But with an
+			 * IN-list we don't just probe the index once; we're going to
+			 * probe it once for every value in the IN-list.  And we are going
+			 * to potentially return an additional row (or set of rows) for
+			 * each probe.  To account for this "multi-probing" we take the
+			 * costEstimate and multiply each of its fields by the size of
+			 * the IN-list.
+			 *
+			 * Note: If the IN-list has duplicate values then this simple
+			 * multiplication could give us an elevated cost (because we
+			 * only probe the index for each *non-duplicate* value in the
+			 * IN-list).  But for now, we're saying that's okay.
+			 */
+			if (ssKeySourceInList != null)
+			{
+				int listSize = ssKeySourceInList.getRightOperandList().size();
+				double rc = costEstimate.rowCount() * listSize;
+				double ssrc = costEstimate.singleScanRowCount() * listSize;
+
+				/* If multiplication by listSize returns more rows than are
+				 * in the scan then just use the number of rows in the scan.
+				 */
+				costEstimate.setCost(
+					costEstimate.getEstimatedCost() * listSize,
+					rc > initialRowCount ? initialRowCount : rc,
+					ssrc > initialRowCount ? initialRowCount : ssrc);
+			}
+
 			/*
 			** Figure out whether to do row locking or table locking.
 			**
@@ -2660,6 +2716,26 @@
 											nonStoreRestrictionList,
 											requalificationRestrictionList,
 											getDataDictionary());
+
+		/* Check to see if we are going to do execution-time probing
+		 * of an index using IN-list values.  We can tell by looking
+		 * at the restriction list: if there is an IN-list probe
+		 * predicate that is also a start/stop key then we know that
+		 * we're going to do execution-time probing.  In that case
+		 * we disable bulk fetching to minimize the number of non-
+		 * matching rows that we read from disk.  RESOLVE: Do we
+		 * really need to completely disable bulk fetching here,
+		 * or can we do something else?
+		 */
+		for (int i = 0; i < restrictionList.size(); i++)
+		{
+			Predicate pred = (Predicate)restrictionList.elementAt(i);
+			if ((pred.getSourceInList() != null) && pred.isStartKey())
+			{
+				disableBulkFetch();
+				break;
+			}
+		}
 
 		/*
 		** Consider turning on bulkFetch if it is turned

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java?view=diff&rev=513839&r1=513838&r2=513839
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java (original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java Fri Mar  2 08:54:05 2007
@@ -1325,4 +1325,18 @@
 			return ((BinaryRelationalOperatorNode)relop).getInListOp();
 		return null;
 	}
+
+	/**
+	 * If this predicate is an IN-list "probe predicate" then "revert"
+	 * it back to its original IN-list form.  This turns out to be
+	 * very easy: we just set the left operand of andNode to be the
+	 * original InListOperatorNode (if non-null).
+	 */
+	protected void revertToSourceInList()
+	{
+		InListOperatorNode ilon = getSourceInList();
+		if (ilon != null)
+			andNode.setLeftOperand(ilon);
+		return;
+	}
 }

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java?view=diff&rev=513839&r1=513838&r2=513839
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java (original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java Fri Mar  2 08:54:05 2007
@@ -202,8 +202,8 @@
 			** BinaryComparisonOperators and IsNullNodes.
 			*/
 			RelationalOperator relop = pred.getRelop();
-			boolean isIn = false;
-			InListOperatorNode inNode = null;
+			InListOperatorNode inNode = pred.getSourceInList();
+			boolean isIn = (inNode != null);
 
 			if (relop == null)
 			{
@@ -601,8 +601,9 @@
 			/* if it's "in" operator, we generate dynamic start and stop key
 			 * to improve index scan performance, beetle 3858.
 			 */
-			boolean isIn = false;
-			InListOperatorNode inNode = null;
+			InListOperatorNode inNode = pred.getSourceInList();
+			boolean isIn = (inNode != null);
+			boolean isInListProbePred = isIn;
 
 			if (relop == null)
 			{
@@ -633,6 +634,18 @@
 								(indexCol.getColumnNumber() != baseColumnPositions[indexPosition]) ||
 								inNode.selfReference(indexCol))
 							indexCol = null;
+						else if (isInListProbePred && (indexPosition > 0))
+						{
+							/* If the predicate is an IN-list probe predicate
+							 * then we only consider it to be useful if the
+							 * referenced column is the *first* one in the
+							 * index (i.e. if (indexPosition == 0)).  Otherwise
+							 * the predicate would be treated as a qualifier
+							 * for store, which could lead to incorrect
+							 * results.
+							 */
+							indexCol = null;
+						}
 					}
 				}
 				else
@@ -651,7 +664,19 @@
 			** operand.
 			*/
 			if (indexCol == null)
+			{
+				/* If we're pushing predicates then this is the last time
+				 * we'll get here before code generation.  So if we have
+				 * any IN-list probe predicates that are not useful, we
+				 * need to "revert" them back to their original IN-list
+				 * form so that they can be generated as regular IN-list
+				 * restrictions.
+				 */
+				if (pushPreds && isInListProbePred)
+					pred.revertToSourceInList();
+
 				continue;
+			}
 
 			pred.setIndexPosition(indexPosition);
 
@@ -712,8 +737,9 @@
 			RelationalOperator	relop             = thisPred.getRelop();
 			int                 thisOperator      = -1;
 
-			boolean             isIn              = false;
-			InListOperatorNode  inNode            = null;
+			InListOperatorNode  inNode            = thisPred.getSourceInList();
+			boolean             isIn              = (inNode != null);
+			boolean             isInListProbePred     = isIn;
 
 			if (relop == null)
 			{
@@ -834,7 +860,21 @@
                  * qualifier.  Beetle 4316.
 				 */
 				if (isIn && ! thisPredMarked)
+				{
+					/* If we get here for an IN-list probe pred then we know
+					 * that we are *not* using the probe predicate as a
+					 * start/stop key.  We also know that we're in the middle
+					 * of modifying access paths (because pushPreds is true),
+					 * which means we are preparing to generate code.  Those
+					 * two facts together mean we have to "revert" the
+					 * probe predicate back to its original state so that
+					 * it can be generated as normal IN-list.
+					 */
+					if (isInListProbePred)
+						thisPred.revertToSourceInList();
+
 					continue;
+				}
 
 				/*
 				** Push the predicate down.  They get pushed down in the
@@ -850,9 +890,13 @@
 				 * which can lead to an infinite recursion loop when we get to
 				 * restorePredicates(), and thus to stack overflow
 				 * (beetle 4974).
+				 *
+				 * Note: we don't do this if the predicate is an IN-list
+				 * probe predicate.  In that case we want to push the 
+				 * predicate down to the base table for special handling.
 				 */
 				Predicate predToPush;
-				if (isIn) 
+				if (isIn && !isInListProbePred)
                 {
 					AndNode andCopy = (AndNode) getNodeFactory().getNode(
 										C_NodeTypes.AND_NODE,
@@ -875,12 +919,15 @@
 
 				if (optTable.pushOptPredicate(predToPush))
 				{
-					/* although we generated dynamic start and stop key for "in"
-					 * , we still need this predicate for further restriction
+					/* Although we generated dynamic start and stop keys
+					 * for "in", we still need this predicate for further
+					 * restriction--*unless* we're dealing with a probe
+					 * predicate, in which case the restriction is handled
+					 * via execution-time index probes (for more see
+					 * execute/TableScanResultSet.java).
 					 */
-					if (! isIn)
+					if (!isIn || isInListProbePred)
 						removeOptPredicate(thisPred);
-					// restore origin
 				}
 				else if (SanityManager.DEBUG)
 				{