From derby-dev-return-34110-apmail-db-derby-dev-archive=db.apache.org@db.apache.org Tue Dec 05 23:46:44 2006 Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 5216 invoked from network); 5 Dec 2006 23:46:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Dec 2006 23:46:43 -0000 Received: (qmail 90344 invoked by uid 500); 5 Dec 2006 23:46:51 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 90310 invoked by uid 500); 5 Dec 2006 23:46:51 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 90301 invoked by uid 99); 5 Dec 2006 23:46:51 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Dec 2006 15:46:51 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Dec 2006 15:46:41 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B86077142BF for ; Tue, 5 Dec 2006 15:46:21 -0800 (PST) Message-ID: <12852991.1165362381734.JavaMail.jira@brutus> Date: Tue, 5 Dec 2006 15:46:21 -0800 (PST) From: "A B (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2 In-Reply-To: <896536.1164840923060.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/DERBY-2130?page=3Dcomments#actio= n_12455802 ]=20 =20 A B commented on DERBY-2130: ---------------------------- Thanks for the query plans, Bryan. I took a quick look and nothing stood o= ut, but I'll have to look more closely before I can say anything for sure. = Thank you for taking the time to pull out the "noise"--I think that'll be = helpful. On a different note, I added some System.outs to various places in the Opti= mizer to see what cost estimates the optimizer is calculating for different= complete join orders in the top-level OptimizerImpl. I then ran the repro= query against the latest 10.1 branch and the latest 10.3 trunk. The abbre= viated output from what I saw is shown below. Some things to note: 0. Lots of "Infinity" and "NaN" cost estimates, which are probably the re= sult of DERBY-1905 and DERBY-1260, respectively. This pretty much renders a= ll but a handful of the cost estimates useless. 1. In 10.1.3 we try to jump to a target join order and we SUCCESSFULLY do= so. In 10.3 we also try to jump to the same target join order, BUT for so= me reason the jump does NOT take effect. So it looks like Bryan's question= about whether or not permutation jumping works correctly was indeed well-f= ounded. This is, I think, the first thing to investigate. 2. Because the 10.3 "jump" does not occur we spend a lot of time iteratin= g through different permutations until we get to the "best join order" foun= d by 10.1. Once we get there the estimated cost for 10.3 (on a 10.1 databas= e) is finite but is much higher than the cost estimate of the same join ord= er in 10.1 (namely, "247" in 10.1 vs "1736685" in 10.3). So this is anothe= r point of interest. This could again be the result of DERBY-1260, but it'= s hard to say at this point. 3. This next observation is weird: when running 10.3 on a 10.1 database t= he query finishes after about 10,000 iterations (roughly 5 minutes on my la= ptop). But when running 10.3 on a *10.3* database the query runs until it = has exhausted all possible permutations (over 40,000). The difference is i= n the calculated cost estimate for the "2 0 1 3 4 5 6 7" join order. When = running against a 10.1 database 10.3 estimates a cost of 1736685, followed = up by a cost of 198589 for the next complete order, which leads leads to ev= entual timeout. But when running against a 10.3 database, the cost estimat= e is *Infinity* and thus the timeout value sits at 5E62, which we never rea= ch. I have absolutely no idea why this would be the case. Any one out the= re have any theories? Bryan, can you perhaps try this out to see if you o= bserve the same behavior? You can connect to a 10.1 database by setting th= e "derby.database.allowPreReleaseUpgrade" property to "true". Note that yo= u should only use that on a _test_ 10.1 database, though; I don't think it'= s intended for production db's. Generally speaking lines in the output are of the form: -=3D> estimatedCost: j o i n O r d e r -------------------------- 10.1.3 on 10.1.3 database: -------------------------- ij version 10.1 -=3D> 5.607246651648704E62: 0 1 2 3 4 5 6 7 -=3D=3D> Just set bestCost to: 5.607246651648704E62 -=3D-> JUMPING to: 1 4 2 6 7 0 3 5 -=3D> Infinity: 1 4 2 6 7 0 3 5 -=3D> NaN: 1 4 2 6 7 0 5 3 -=3D> NaN: 1 4 2 6 7 3 0 5 -=3D> NaN: 1 4 2 6 7 3 5 0 -=3D> NaN: 1 4 2 6 7 5 0 3 -=3D> NaN: 1 4 2 6 7 5 3 0 -=3D> NaN: 1 4 2 7 0 3 5 6 -=3D> NaN: 1 4 2 7 0 3 6 5 -=3D> NaN: 1 4 2 7 0 5 3 6 -=3D> NaN: 1 4 2 7 0 5 6 3 -=3D> NaN: 1 4 2 7 0 6 3 5 -=3D> NaN: 1 4 2 7 0 6 5 3 -=3D> NaN: 1 7 6 5 4 0 2 3 -=3D> NaN: 1 7 6 5 4 0 3 2 -=3D> NaN: 1 7 6 5 4 2 0 3 -=3D> NaN: 1 7 6 5 4 2 3 0 -=3D> NaN: 1 7 6 5 4 3 0 2 -=3D> NaN: 1 7 6 5 4 3 2 0 -=3D> 247.4647753453739: 2 0 1 3 4 5 6 7 -=3D=3D> Just set bestCost to: 247.4647753453739 -=3D-> TIMED OUT: exceeded 247.4647753453739 ms. -=3D> 247.4647753453739: 2 0 1 3 4 5 6 7 ELAPSED TIME =3D 82961 milliseconds ------------------------ 10.3 on *10.1.3* database: ------------------------ (on the *same* database that was created by 10.1) ij version 10.3 -=3D> 5.607246651648704E62: 0 1 2 3 4 5 6 7 -=3D=3D> Just set bestCost to: 5.607246651648704E62 -=3D-> JUMPING to: 1 4 2 6 7 0 3 5 -=3D> 1.8253980349686949E106: 0 1 2 3 4 5 6 7 -=3D> 3.021440681134236E107: 0 1 2 3 4 5 7 6 -=3D> 8.374147544754745E119: 0 1 2 3 4 6 5 7 -=3D> 8.374147544754745E119: 0 1 2 3 4 6 7 5 -=3D> 5.993308284393933E109: 0 1 2 3 4 7 5 6 -=3D> 2.736734332816085E108: 0 1 2 3 4 7 6 5 -=3D> 3.973494182787193E115: 0 1 2 3 5 4 6 7 -=3D> 6.577018684224087E116: 0 1 2 3 5 4 7 6 -=3D> 3.973494182787193E115: 0 1 2 3 5 6 4 7 -=3D> 6.577018684224087E116: 0 1 2 3 5 7 4 6 -=3D> 8.374147544754745E119: 0 1 2 3 6 4 5 7 -=3D> 8.374147544754745E119: 0 1 2 3 6 4 7 5 -=3D> 1.0852895218002162E119: 0 1 2 3 6 5 4 7 -=3D> 8.374147544754745E119: 0 1 2 3 6 7 4 5 -=3D> 5.993308284393933E109: 0 1 2 3 7 4 5 6 -=3D> 2.736734332816085E108: 0 1 2 3 7 4 6 5 -=3D> 1.2945545894290903E112: 0 1 2 3 7 5 4 6 -=3D> 2.736734332816085E108: 0 1 2 3 7 6 4 5 -=3D> NaN: 0 1 2 4 5 3 6 7 -=3D> NaN: 0 1 2 4 5 3 7 6 -=3D> NaN: 0 1 2 4 5 6 3 7 -=3D> NaN: 0 1 2 4 5 6 7 3 -=3D> NaN: 0 1 2 4 5 7 3 6 -=3D> NaN: 0 1 2 4 5 7 6 3 -=3D> NaN: 1 7 6 5 4 0 2 3 -=3D> NaN: 1 7 6 5 4 0 3 2 -=3D> NaN: 1 7 6 5 4 2 0 3 -=3D> NaN: 1 7 6 5 4 2 3 0 -=3D> NaN: 1 7 6 5 4 3 0 2 -=3D> NaN: 1 7 6 5 4 3 2 0 -=3D> 1736685.050617207: 2 0 1 3 4 5 6 7 -=3D=3D> Just set bestCost to: 1736685.050617207 -=3D> 198589.53739726284: 2 0 1 3 4 5 7 6 -=3D=3D> Just set bestCost to: 198589.53739726284 -=3D-> TIMED OUT: exceeded 198589.53739726284 ms. -=3D> 198589.53739726284: 2 0 1 3 4 5 7 6 ELAPSED TIME =3D 336370 milliseconds ------------------------ 10.3 on 10.3 database: ------------------------ ij version 10.3 -=3D> 5.607246651648704E62: 0 1 2 3 4 5 6 7 -=3D=3D> Just set bestCost to: 5.607246651648704E62 -=3D-> JUMPING to: 1 4 2 6 7 0 3 5 -=3D> 1.8253980349686949E106: 0 1 2 3 4 5 6 7 -=3D> 3.021440681134236E107: 0 1 2 3 4 5 7 6 -=3D> 8.374147544754745E119: 0 1 2 3 4 6 5 7 -=3D> 8.374147544754745E119: 0 1 2 3 4 6 7 5 -=3D> 5.993308284393933E109: 0 1 2 3 4 7 5 6 -=3D> 2.736734332816085E108: 0 1 2 3 4 7 6 5 -=3D> 3.973494182787193E115: 0 1 2 3 5 4 6 7 -=3D> 6.577018684224087E116: 0 1 2 3 5 4 7 6 -=3D> 3.973494182787193E115: 0 1 2 3 5 6 4 7 -=3D> 6.577018684224087E116: 0 1 2 3 5 7 4 6 -=3D> 8.374147544754745E119: 0 1 2 3 6 4 5 7 -=3D> 8.374147544754745E119: 0 1 2 3 6 4 7 5 -=3D> 1.0852895218002162E119: 0 1 2 3 6 5 4 7 -=3D> 8.374147544754745E119: 0 1 2 3 6 7 4 5 -=3D> 5.993308284393933E109: 0 1 2 3 7 4 5 6 -=3D> 2.736734332816085E108: 0 1 2 3 7 4 6 5 -=3D> 1.2945545894290903E112: 0 1 2 3 7 5 4 6 -=3D> 2.736734332816085E108: 0 1 2 3 7 6 4 5 -=3D> NaN: 0 1 2 4 5 3 6 7 -=3D> NaN: 0 1 2 4 5 3 7 6 -=3D> NaN: 0 1 2 4 5 6 3 7 -=3D> NaN: 0 1 2 4 5 6 7 3 -=3D> NaN: 0 1 2 4 5 7 3 6 -=3D> NaN: 1 7 6 5 4 2 0 3 -=3D> NaN: 1 7 6 5 4 2 3 0 -=3D> NaN: 1 7 6 5 4 3 0 2 -=3D> NaN: 1 7 6 5 4 3 2 0 -=3D> Infinity: 2 0 1 3 4 5 6 7 -=3D> NaN: 2 0 1 3 4 5 7 6 -=3D> NaN: 2 0 1 3 4 6 5 7 -=3D> NaN: 2 0 1 3 4 6 7 5 -=3D> NaN: 2 0 1 3 4 7 5 6 -=3D> NaN: 2 0 1 3 4 7 6 5 -=3D> NaN: 2 0 1 3 5 4 6 7 -=3D> NaN: 7 6 5 4 3 1 0 2 -=3D> NaN: 7 6 5 4 3 1 2 0 -=3D> NaN: 7 6 5 4 3 2 0 1 -=3D> NaN: 7 6 5 4 3 2 1 0 -=3D-> ENDED because ran out of permutations. -=3D> 5.607246651648704E62: 0 1 2 3 4 5 6 7 ELAPSED TIME =3D 1145530 milliseconds I'm still trying to make sense of this data, but perhaps this info will be = helpful in tracking down the issue(s) here... > Optimizer performance slowdown from 10.1 to 10.2 > ------------------------------------------------ > > Key: DERBY-2130 > URL: http://issues.apache.org/jira/browse/DERBY-2130 > Project: Derby > Issue Type: Bug > Components: Performance, SQL > Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1 > Reporter: Bryan Pendleton > Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sq= l > > > Attached is 'repro.sql', an IJ script which demonstrates what I > believe to be a serious performance issue in the Optimizer. > I have run this script in a number of configurations: > - 10.1.2.1: the script runs successfully. The 'prepare' statement > takes about 90 seconds, on a fairly powerful Windows machine > - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777 > - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement > often takes about 220 seconds, on the same Windows machine > Intermittently, on 10.2 and on the trunk, the prepare statement takes > 15+ minutes. I cannot reliably reproduce this; I run the same script > several times in a row and I cannot predict whether it will take 220 > seconds or whether it will take 15+ minutes. > I am quite motivated to work on this problem, as this is blocking me from > using Derby for a project that I'm quite keen on, but I need some > suggestions and ideas about how to attack it. From my perspective > there are 3 primary topics: > 1) Why did optimizer performance for this query degrade so significantly > from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slowe= r, > for this particular query at least, in 10.2. Sometimes it is 10x slower. > 2) What is the source of the non-determinism? Why does the optimizer > often take 4 minutes to optimize this query on the trunk, but sometimes > take 15+ minutes? I don't believe that I'm changing anything from > run to run. > 3) Can we improve the optimizer performance even beyond what it was > for 10.1.2? I realize that this is an ugly query, but I was hoping to > see an optimization time of 5-10 seconds, not 90 seconds (and certainly > not 220 seconds). > I have attempted to start answering some of these questions, with > limited success. Here is some of what I think I've discovered so far: > - the optimizer changes in 10.2 seem to have given the optimizer many > more choices of possible query plans to consider. I think this means > that, if the optimizer does not time out, it will spend substantially > more time optimizing because there are more choices to evaluate. Does > this by itself mean that the optimizer will take 2.5 times longer in > 10.2 than it did in 10.1? > - something about this query seems to make the costing mechanism go > haywire, and produce extreme costs. While stepping through the > optimization of this query in the debugger I have seen it compute > costs like 1e63 and 1e200. This might be very closely related to > DERBY-1905, although I don't think I'm doing any subqueries here. > But maybe I'm misunderstanding the term "subquery" in DERBY-1905. > At any rate, due to the enormous estimated costs, timeout does not > occur. > - the WHERE clause in this query is converted during compilation to=20 > an equivalent IN clause, I believe, which then causes me to run into > a number of the problems described in DERBY-47 and DERBY-713. > Specifically, rather than constructing a plan which involves 4 > index probes for the 4 WHERE clause values, the optimizer decides > that an index scan must be performed and that it will have to process > the entire index (because the query uses parameter markers, not > literal values). So perhaps solving DERBY-47 would help me > - the optimizer in fact comes up with a "decent" query plan quite quickl= y. > I have experimented with placing a hard limit into the optimizer > timeout code, so that I can force optimization to stop after an > arbitrary fixed period of time. Then I have been able to set that > value to as low as 1 second, and the optimizer has produced plans > that then execute in a few milliseconds. Of course, I have only tried > this with a trivial amount of data in my database, so it's possible > that the plan produced by the optimizer after just a second of > optimizing is in fact poor, and I'm just not noticing it because my > data sizes are so small. > At this point, what would be really helpful to me would be some suggestio= ns > about some general approaches or techniques to try to start breaking down > and analyzing this problem. --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira