Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Received-SPF: neutral (herse.apache.org: 32.97.110.153 is neither permitted
 nor denied by domain of qozinx@gmail.com)
Message-ID: <4564F1CA.7050809@gmail.com>
Date: Wed, 22 Nov 2006 16:56:42 -0800
From: Army <qozinx@gmail.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:1.7.1) Gecko/20040707
MIME-Version: 1.0
To: derby-dev@db.apache.org
Subject: Re: Optimizer: question about predicate pushdown and handling of
 intermediate costs
References: <4564E242.5050403@amberpoint.com>
In-Reply-To: <4564E242.5050403@amberpoint.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Bryan Pendleton wrote:
> I have two questions about this code, both related to the edge-cases of
> floating point behavior:

I think both of your questions fall into the category of DERBY-1260, which was 
filed for investigation of how infinite cost estimates affect Derby costing 
calculations.  I think that, generally speaking, the code does not expect to see 
infinite cost estimates, so when such estimates occur there is no logic to 
account for them.  The two areas you point to are very good examples of the 
kinds of problems that exist in the face of infinite cost estimates.

Hopefully when DERBY-1905 is resolved the number of query plans yielding 
infinite cost estimates will decrease significantly--but even so, it stands to 
reason that such cost estimates are bound to occur at some point, and thus the 
Derby code should account for them.

> 1) if newCost is Infinite and pullCost is Infinite, then newCost -= 
> pullCost results in NaN.

Yes, this is true.

> I'm not quite sure what ought to happen here  but one thought I had is
> that perhaps the last line should look like:
>                     if (newCost < 0.0 || Double.isNaN(newCost))
>                         newCost = 0.0;

If we take an extreme case where we have four Optimizables with the following 
cost estimates:

   O1: 1000
   O2: Infinity
   O3: Infinity
   O4: Infinity

The total cost for the plan will be Infinity.  Then we call 
getNextPermutation(), which pulls O4.  newCost becomes Infinity - Infinity = 
NaN.  If we then use the logic you mentioned above, wouldn't the result be that 
we set newCost to be 0.0?  Instead, I think we would "want" (loosely speaking) 
newCost to be Infinity (because that's the sum of 1000 + Infinity + Infinity).

Similarly, let's say we have:

   O1: 1000
   O2: 2000
   O3: Infinity
   O4: 500

Then we call getNextPermutation(), which pulls O4.  newCost becomes Infinity - 
500 = Infinity.  Then we pull 03, so newCost becomes Infinity - Infinity = NaN. 
  We would again end up setting newCost to 0.0 in this case, but I don't think 
that would be correct.  What we want is for newCost to be 1000 + 2000 = 3000.

I admit I only looked at this very quickly (it's getting close the end of the 
day) so I could be misinterpreting what you mean.  Feel free to call me out if 
I'm not grasping your suggested change correctly (and my apologies in advance if 
that's the case).

> 2) if newCost is Infinite, but pullCost is finite, then it would seem that
>    during the costing of this optimizable, the estimate crossed over 
>    from being finite to being Infinite

Is this always going to be true?  For instance, if we look at the second example 
above and assume we are pulling O4, would that be a case where "newCost" is 
Infinity but pullCost is finite (500)?  In that scenario it was not the costing 
of O4 that caused us to cross over the line, it was the costing of O3.

Of course, given the apprent over-emphasis that the optimizer currently gives to 
"outer costs" (DERBY-1905) maybe the second example above won't ever happen. 
Perhaps instead the fact that the outer cost for O4 is Infinity means that the 
estimated cost of O4 will always end up being infinity, as well.  Theoretically 
I don't think that should be the case, but perhaps in actuality that's what happens.

> but in that case, subtracting the finite pullCost back out from newCost will 
> not bring the current cost estimate back to being a finite number; it will 
> remain Infinite. I wonder whether there should be a line like:
>        if (Double.isInfinite(newCost) && ! Double.isInfinite(pullCost))
>             newCost = Double.MAX_VALUE - pullCost;

For the second example I gave above (which is questionable in terms of 
likelihood) I think the result of pulling O4 using this logic would leave a cost 
of MAX_VALUE minus 500 for the first three Optimizables, when the "real" (I use 
the term loosely) estimate should be 1000 + 2000 + Infinity = Infinity.

> Please understand that I don't have any examples of actual queries in 
> which either of these situations arise; these are just based on my reading and 
> thinking about the code. So these questions may be totally nonsensical.

These questions are very sensical and this is, I think, an important problem in 
the cost estimates for large queries.  If you'd like to add these examples to 
DERBY-1260, I think that'd be great.  At some point hopefully someone will have 
the time and inclination to look further into these kinds of issues...

Thanks for your interest in the optimizer!

Army