Return-Path: Delivered-To: apmail-db-derby-commits-archive@www.apache.org Received: (qmail 81164 invoked from network); 2 Mar 2007 16:54:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Mar 2007 16:54:27 -0000 Received: (qmail 40392 invoked by uid 500); 2 Mar 2007 16:54:36 -0000 Delivered-To: apmail-db-derby-commits-archive@db.apache.org Received: (qmail 40368 invoked by uid 500); 2 Mar 2007 16:54:36 -0000 Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: "Derby Development" List-Id: Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 40353 invoked by uid 99); 2 Mar 2007 16:54:36 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2007 08:54:36 -0800 X-ASF-Spam-Status: No, hits=-99.5 required=10.0 tests=ALL_TRUSTED,NO_REAL_NAME X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO eris.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2007 08:54:26 -0800 Received: by eris.apache.org (Postfix, from userid 65534) id 4FAFF1A981A; Fri, 2 Mar 2007 08:54:06 -0800 (PST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r513839 - in /db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile: FromBaseTable.java Predicate.java PredicateList.java Date: Fri, 02 Mar 2007 16:54:06 -0000 To: derby-commits@db.apache.org From: abrown@apache.org X-Mailer: svnmailer-1.1.0 Message-Id: <20070302165406.4FAFF1A981A@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: abrown Date: Fri Mar 2 08:54:05 2007 New Revision: 513839 URL: http://svn.apache.org/viewvc?view=rev&rev=513839 Log: DERBY-47 (partial): Update the logic for cost-based optimization (CBO) and modification of access paths (MoAP) to recognize IN-list "probe predicates" and to handle them appropriately. More specifically this patch adds code to do the following: - During costing, recognize when we're using a probe predicate as a start/stop key and adjust the cost accordingly. This means multiplying the estimated cost and row count for "column = ?" by the number of values in the IN-list (because we are effectively going to evaluate "column = ?" N times, where N is the size of the IN-list, and we could return one or more rows for each of the N evaluations). We also want to make sure that the resultant row count estimate is not greater than the total number of rows in the table. - When determining which predicates can be used as start/stop keys for the current conglomerate, only consider a probe predicate to be a start/stop key if it applies to the _first_ column in the conglomerate. Otherwise the probe predicate would end up being generated as a store qualifier, which means we would only get rows for which "column = ?" was true when the parameter was set to the _first_ value in the IN-list. That means we would end up with incorrect results (missing rows). - If cost-based optimization is complete and we are modifying access paths in preparation for code generation, then take any probe predicates that are *not* going to be used as start/stop keys for the chosen conglomerate and "revert" them back to the InListOperatorNodes from which they were built. Those InListOpNodes will then be generated as normal IN-list restrictions on the rows returned from store. If we did not do this reverting then the predicates would ultimately be ignored (since they are not valid qualifiers) and we would therefore end up with incorrect results (extra rows). - If we're modifying access paths and we have chosen to do multi-probing of an index then we disable bulk fetching for the target base table. Logically this is not a requirement. However, it turns out that bulk fetch can lead to poor performance when multi-probing an index if the number of probe values is high (several hundred or more) BUT that number is still just a small fraction of the total number of rows in the table. By disabling bulk fetch for multi-probing we avoid this performance slowdown. Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java?view=diff&rev=513839&r1=513838&r2=513839 ============================================================================== --- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java (original) +++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Fri Mar 2 08:54:05 2007 @@ -1309,6 +1309,11 @@ startGap = false; stopGap = false; + /* If we have a probe predicate that is being used as a start/stop + * key then ssKeySourceInList will hold the InListOperatorNode + * from which the probe predicate was built. + */ + InListOperatorNode ssKeySourceInList = null; for (int i = 0; i < predListSize; i++) { pred = baseTableRestrictionList.getOptPredicate(i); @@ -1317,6 +1322,26 @@ if (startKey || stopKey) { + /* A probe predicate is only useful if it can be used as + * as a start/stop key for _first_ column in an index + * (i.e. if the column position is 0). That said, we only + * allow a single start/stop key per column position in the + * index (see orderUsefulPredicates() in PredicateList.java). + * Those two facts combined mean that we should never have + * more than one probe predicate start/stop key for a given + * conglomerate. + */ + if (SanityManager.DEBUG) + { + SanityManager.ASSERT( + (ssKeySourceInList == null) || + (((Predicate)pred).getSourceInList() == null), + "Found multiple probe predicate start/stop keys" + + " for conglomerate '" + cd.getConglomerateName() + + "' when at most one was expected."); + } + + ssKeySourceInList = ((Predicate)pred).getSourceInList(); boolean knownConstant = pred.compareWithKnownConstant(this, true); if (startKey) @@ -1514,6 +1539,37 @@ } } + /* If the start and stop key came from an IN-list "probe predicate" + * then we need to adjust the cost estimate. The probe predicate + * is of the form "col = ?" and we currently have the estimated + * cost of probing the index a single time for "?". But with an + * IN-list we don't just probe the index once; we're going to + * probe it once for every value in the IN-list. And we are going + * to potentially return an additional row (or set of rows) for + * each probe. To account for this "multi-probing" we take the + * costEstimate and multiply each of its fields by the size of + * the IN-list. + * + * Note: If the IN-list has duplicate values then this simple + * multiplication could give us an elevated cost (because we + * only probe the index for each *non-duplicate* value in the + * IN-list). But for now, we're saying that's okay. + */ + if (ssKeySourceInList != null) + { + int listSize = ssKeySourceInList.getRightOperandList().size(); + double rc = costEstimate.rowCount() * listSize; + double ssrc = costEstimate.singleScanRowCount() * listSize; + + /* If multiplication by listSize returns more rows than are + * in the scan then just use the number of rows in the scan. + */ + costEstimate.setCost( + costEstimate.getEstimatedCost() * listSize, + rc > initialRowCount ? initialRowCount : rc, + ssrc > initialRowCount ? initialRowCount : ssrc); + } + /* ** Figure out whether to do row locking or table locking. ** @@ -2660,6 +2716,26 @@ nonStoreRestrictionList, requalificationRestrictionList, getDataDictionary()); + + /* Check to see if we are going to do execution-time probing + * of an index using IN-list values. We can tell by looking + * at the restriction list: if there is an IN-list probe + * predicate that is also a start/stop key then we know that + * we're going to do execution-time probing. In that case + * we disable bulk fetching to minimize the number of non- + * matching rows that we read from disk. RESOLVE: Do we + * really need to completely disable bulk fetching here, + * or can we do something else? + */ + for (int i = 0; i < restrictionList.size(); i++) + { + Predicate pred = (Predicate)restrictionList.elementAt(i); + if ((pred.getSourceInList() != null) && pred.isStartKey()) + { + disableBulkFetch(); + break; + } + } /* ** Consider turning on bulkFetch if it is turned Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java?view=diff&rev=513839&r1=513838&r2=513839 ============================================================================== --- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java (original) +++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/Predicate.java Fri Mar 2 08:54:05 2007 @@ -1325,4 +1325,18 @@ return ((BinaryRelationalOperatorNode)relop).getInListOp(); return null; } + + /** + * If this predicate is an IN-list "probe predicate" then "revert" + * it back to its original IN-list form. This turns out to be + * very easy: we just set the left operand of andNode to be the + * original InListOperatorNode (if non-null). + */ + protected void revertToSourceInList() + { + InListOperatorNode ilon = getSourceInList(); + if (ilon != null) + andNode.setLeftOperand(ilon); + return; + } } Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java?view=diff&rev=513839&r1=513838&r2=513839 ============================================================================== --- db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java (original) +++ db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/PredicateList.java Fri Mar 2 08:54:05 2007 @@ -202,8 +202,8 @@ ** BinaryComparisonOperators and IsNullNodes. */ RelationalOperator relop = pred.getRelop(); - boolean isIn = false; - InListOperatorNode inNode = null; + InListOperatorNode inNode = pred.getSourceInList(); + boolean isIn = (inNode != null); if (relop == null) { @@ -601,8 +601,9 @@ /* if it's "in" operator, we generate dynamic start and stop key * to improve index scan performance, beetle 3858. */ - boolean isIn = false; - InListOperatorNode inNode = null; + InListOperatorNode inNode = pred.getSourceInList(); + boolean isIn = (inNode != null); + boolean isInListProbePred = isIn; if (relop == null) { @@ -633,6 +634,18 @@ (indexCol.getColumnNumber() != baseColumnPositions[indexPosition]) || inNode.selfReference(indexCol)) indexCol = null; + else if (isInListProbePred && (indexPosition > 0)) + { + /* If the predicate is an IN-list probe predicate + * then we only consider it to be useful if the + * referenced column is the *first* one in the + * index (i.e. if (indexPosition == 0)). Otherwise + * the predicate would be treated as a qualifier + * for store, which could lead to incorrect + * results. + */ + indexCol = null; + } } } else @@ -651,7 +664,19 @@ ** operand. */ if (indexCol == null) + { + /* If we're pushing predicates then this is the last time + * we'll get here before code generation. So if we have + * any IN-list probe predicates that are not useful, we + * need to "revert" them back to their original IN-list + * form so that they can be generated as regular IN-list + * restrictions. + */ + if (pushPreds && isInListProbePred) + pred.revertToSourceInList(); + continue; + } pred.setIndexPosition(indexPosition); @@ -712,8 +737,9 @@ RelationalOperator relop = thisPred.getRelop(); int thisOperator = -1; - boolean isIn = false; - InListOperatorNode inNode = null; + InListOperatorNode inNode = thisPred.getSourceInList(); + boolean isIn = (inNode != null); + boolean isInListProbePred = isIn; if (relop == null) { @@ -834,7 +860,21 @@ * qualifier. Beetle 4316. */ if (isIn && ! thisPredMarked) + { + /* If we get here for an IN-list probe pred then we know + * that we are *not* using the probe predicate as a + * start/stop key. We also know that we're in the middle + * of modifying access paths (because pushPreds is true), + * which means we are preparing to generate code. Those + * two facts together mean we have to "revert" the + * probe predicate back to its original state so that + * it can be generated as normal IN-list. + */ + if (isInListProbePred) + thisPred.revertToSourceInList(); + continue; + } /* ** Push the predicate down. They get pushed down in the @@ -850,9 +890,13 @@ * which can lead to an infinite recursion loop when we get to * restorePredicates(), and thus to stack overflow * (beetle 4974). + * + * Note: we don't do this if the predicate is an IN-list + * probe predicate. In that case we want to push the + * predicate down to the base table for special handling. */ Predicate predToPush; - if (isIn) + if (isIn && !isInListProbePred) { AndNode andCopy = (AndNode) getNodeFactory().getNode( C_NodeTypes.AND_NODE, @@ -875,12 +919,15 @@ if (optTable.pushOptPredicate(predToPush)) { - /* although we generated dynamic start and stop key for "in" - * , we still need this predicate for further restriction + /* Although we generated dynamic start and stop keys + * for "in", we still need this predicate for further + * restriction--*unless* we're dealing with a probe + * predicate, in which case the restriction is handled + * via execution-time index probes (for more see + * execute/TableScanResultSet.java). */ - if (! isIn) + if (!isIn || isInListProbePred) removeOptPredicate(thisPred); - // restore origin } else if (SanityManager.DEBUG) {