Bryan Pendleton wrote:
> I have two questions about this code, both related to the edgecases of
> floating point behavior:
I think both of your questions fall into the category of DERBY1260, which was
filed for investigation of how infinite cost estimates affect Derby costing
calculations. I think that, generally speaking, the code does not expect to see
infinite cost estimates, so when such estimates occur there is no logic to
account for them. The two areas you point to are very good examples of the
kinds of problems that exist in the face of infinite cost estimates.
Hopefully when DERBY1905 is resolved the number of query plans yielding
infinite cost estimates will decrease significantlybut even so, it stands to
reason that such cost estimates are bound to occur at some point, and thus the
Derby code should account for them.
> 1) if newCost is Infinite and pullCost is Infinite, then newCost =
> pullCost results in NaN.
Yes, this is true.
> I'm not quite sure what ought to happen here but one thought I had is
> that perhaps the last line should look like:
> if (newCost < 0.0  Double.isNaN(newCost))
> newCost = 0.0;
If we take an extreme case where we have four Optimizables with the following
cost estimates:
O1: 1000
O2: Infinity
O3: Infinity
O4: Infinity
The total cost for the plan will be Infinity. Then we call
getNextPermutation(), which pulls O4. newCost becomes Infinity  Infinity =
NaN. If we then use the logic you mentioned above, wouldn't the result be that
we set newCost to be 0.0? Instead, I think we would "want" (loosely speaking)
newCost to be Infinity (because that's the sum of 1000 + Infinity + Infinity).
Similarly, let's say we have:
O1: 1000
O2: 2000
O3: Infinity
O4: 500
Then we call getNextPermutation(), which pulls O4. newCost becomes Infinity 
500 = Infinity. Then we pull 03, so newCost becomes Infinity  Infinity = NaN.
We would again end up setting newCost to 0.0 in this case, but I don't think
that would be correct. What we want is for newCost to be 1000 + 2000 = 3000.
I admit I only looked at this very quickly (it's getting close the end of the
day) so I could be misinterpreting what you mean. Feel free to call me out if
I'm not grasping your suggested change correctly (and my apologies in advance if
that's the case).
> 2) if newCost is Infinite, but pullCost is finite, then it would seem that
> during the costing of this optimizable, the estimate crossed over
> from being finite to being Infinite
Is this always going to be true? For instance, if we look at the second example
above and assume we are pulling O4, would that be a case where "newCost" is
Infinity but pullCost is finite (500)? In that scenario it was not the costing
of O4 that caused us to cross over the line, it was the costing of O3.
Of course, given the apprent overemphasis that the optimizer currently gives to
"outer costs" (DERBY1905) maybe the second example above won't ever happen.
Perhaps instead the fact that the outer cost for O4 is Infinity means that the
estimated cost of O4 will always end up being infinity, as well. Theoretically
I don't think that should be the case, but perhaps in actuality that's what happens.
> but in that case, subtracting the finite pullCost back out from newCost will
> not bring the current cost estimate back to being a finite number; it will
> remain Infinite. I wonder whether there should be a line like:
> if (Double.isInfinite(newCost) && ! Double.isInfinite(pullCost))
> newCost = Double.MAX_VALUE  pullCost;
For the second example I gave above (which is questionable in terms of
likelihood) I think the result of pulling O4 using this logic would leave a cost
of MAX_VALUE minus 500 for the first three Optimizables, when the "real" (I use
the term loosely) estimate should be 1000 + 2000 + Infinity = Infinity.
> Please understand that I don't have any examples of actual queries in
> which either of these situations arise; these are just based on my reading and
> thinking about the code. So these questions may be totally nonsensical.
These questions are very sensical and this is, I think, an important problem in
the cost estimates for large queries. If you'd like to add these examples to
DERBY1260, I think that'd be great. At some point hopefully someone will have
the time and inclination to look further into these kinds of issues...
Thanks for your interest in the optimizer!
Army
