Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 12133 invoked from network); 23 Nov 2006 00:57:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Nov 2006 00:57:16 -0000 Received: (qmail 61440 invoked by uid 500); 23 Nov 2006 00:57:25 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 61221 invoked by uid 500); 23 Nov 2006 00:57:25 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 61210 invoked by uid 99); 23 Nov 2006 00:57:24 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Nov 2006 16:57:24 -0800 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: 32.97.110.153 is neither permitted nor denied by domain of qozinx@gmail.com) Received: from [32.97.110.153] (HELO e35.co.us.ibm.com) (32.97.110.153) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Nov 2006 16:57:10 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.13.8/8.12.11) with ESMTP id kAN0uj6m024668 for ; Wed, 22 Nov 2006 19:56:45 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id kAN0ujh1539418 for ; Wed, 22 Nov 2006 17:56:45 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id kAN0uieb018405 for ; Wed, 22 Nov 2006 17:56:45 -0700 Received: from [127.0.0.1] (svl-arbrown.svl.ibm.com [9.30.38.165]) by d03av04.boulder.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id kAN0ug9P018356 for ; Wed, 22 Nov 2006 17:56:44 -0700 Message-ID: <4564F1CA.7050809@gmail.com> Date: Wed, 22 Nov 2006 16:56:42 -0800 From: Army User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.1) Gecko/20040707 X-Accept-Language: en-us, en MIME-Version: 1.0 To: derby-dev@db.apache.org Subject: Re: Optimizer: question about predicate pushdown and handling of intermediate costs References: <4564E242.5050403@amberpoint.com> In-Reply-To: <4564E242.5050403@amberpoint.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Bryan Pendleton wrote: > I have two questions about this code, both related to the edge-cases of > floating point behavior: I think both of your questions fall into the category of DERBY-1260, which was filed for investigation of how infinite cost estimates affect Derby costing calculations. I think that, generally speaking, the code does not expect to see infinite cost estimates, so when such estimates occur there is no logic to account for them. The two areas you point to are very good examples of the kinds of problems that exist in the face of infinite cost estimates. Hopefully when DERBY-1905 is resolved the number of query plans yielding infinite cost estimates will decrease significantly--but even so, it stands to reason that such cost estimates are bound to occur at some point, and thus the Derby code should account for them. > 1) if newCost is Infinite and pullCost is Infinite, then newCost -= > pullCost results in NaN. Yes, this is true. > I'm not quite sure what ought to happen here but one thought I had is > that perhaps the last line should look like: > if (newCost < 0.0 || Double.isNaN(newCost)) > newCost = 0.0; If we take an extreme case where we have four Optimizables with the following cost estimates: O1: 1000 O2: Infinity O3: Infinity O4: Infinity The total cost for the plan will be Infinity. Then we call getNextPermutation(), which pulls O4. newCost becomes Infinity - Infinity = NaN. If we then use the logic you mentioned above, wouldn't the result be that we set newCost to be 0.0? Instead, I think we would "want" (loosely speaking) newCost to be Infinity (because that's the sum of 1000 + Infinity + Infinity). Similarly, let's say we have: O1: 1000 O2: 2000 O3: Infinity O4: 500 Then we call getNextPermutation(), which pulls O4. newCost becomes Infinity - 500 = Infinity. Then we pull 03, so newCost becomes Infinity - Infinity = NaN. We would again end up setting newCost to 0.0 in this case, but I don't think that would be correct. What we want is for newCost to be 1000 + 2000 = 3000. I admit I only looked at this very quickly (it's getting close the end of the day) so I could be misinterpreting what you mean. Feel free to call me out if I'm not grasping your suggested change correctly (and my apologies in advance if that's the case). > 2) if newCost is Infinite, but pullCost is finite, then it would seem that > during the costing of this optimizable, the estimate crossed over > from being finite to being Infinite Is this always going to be true? For instance, if we look at the second example above and assume we are pulling O4, would that be a case where "newCost" is Infinity but pullCost is finite (500)? In that scenario it was not the costing of O4 that caused us to cross over the line, it was the costing of O3. Of course, given the apprent over-emphasis that the optimizer currently gives to "outer costs" (DERBY-1905) maybe the second example above won't ever happen. Perhaps instead the fact that the outer cost for O4 is Infinity means that the estimated cost of O4 will always end up being infinity, as well. Theoretically I don't think that should be the case, but perhaps in actuality that's what happens. > but in that case, subtracting the finite pullCost back out from newCost will > not bring the current cost estimate back to being a finite number; it will > remain Infinite. I wonder whether there should be a line like: > if (Double.isInfinite(newCost) && ! Double.isInfinite(pullCost)) > newCost = Double.MAX_VALUE - pullCost; For the second example I gave above (which is questionable in terms of likelihood) I think the result of pulling O4 using this logic would leave a cost of MAX_VALUE minus 500 for the first three Optimizables, when the "real" (I use the term loosely) estimate should be 1000 + 2000 + Infinity = Infinity. > Please understand that I don't have any examples of actual queries in > which either of these situations arise; these are just based on my reading and > thinking about the code. So these questions may be totally nonsensical. These questions are very sensical and this is, I think, an important problem in the cost estimates for large queries. If you'd like to add these examples to DERBY-1260, I think that'd be great. At some point hopefully someone will have the time and inclination to look further into these kinds of issues... Thanks for your interest in the optimizer! Army