db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (DERBY-3498) Performance sensitive to FROM order list in three way join with GROUP BY
Date Wed, 05 Mar 2008 20:11:40 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575482#action_12575482
] 

army edited comment on DERBY-3498 at 3/5/08 12:10 PM:
-----------------------------------------------------

I notice that run/Populate.java has a parameter "createConstraintsBeforeLoad" which defaults
to "true", meaning that all indexes are created BEFORE data is loaded.  Is that parameter
overridden by the "junit-system-mini-codeline-jars" target?  If not, then it's possible that
the statistics for the tables are completely wrong (known issue in Derby when indexes are
created before data is loaded), which could cause the optimizer to pick a bad plan.

If keeping "createConstraintsBeforeLoad()" is important to the test, then it might be good
to add some calls to compress the tables after the load completes, so that the stats are updated
accordingly.  That might (hopefully) help things a bit...

      was (Author: army):
    I notice that run/Populate.java has a parameter "createConstraintsBeforeLoad" which defaults
to "true", meaning that all indexes are created BEFORE data is loaded.  Is that parameter
overridden by the "junit-system-mini-codeline-jars" target?
  
> Performance sensitive to FROM order list in three way join with GROUP BY
> ------------------------------------------------------------------------
>
>                 Key: DERBY-3498
>                 URL: https://issues.apache.org/jira/browse/DERBY-3498
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>            Reporter: Daniel John Debrunner
>            Priority: Minor
>
> The Order Entry system test has this query in Checks.testCondition2():
> SELECT D.D_ID, D.D_W_ID,D.D_NEXT_O_ID, 
>                         MAX(O.O_ID),MAX(N.NO_O_ID) FROM NEWORDERS N, 
>                         DISTRICT D, ORDERS O WHERE D.D_W_ID=O.O_W_ID 
>                         AND D.D_W_ID = N.NO_W_ID AND D.D_ID = O.O_D_ID 
>                         AND D.D_ID = N.NO_D_ID GROUP BY 
>                         D.D_ID,D.D_W_ID,D.D_NEXT_O_ID
> Changing the FROM list to put DISTRICT first reduced the running time from ~138seconds
to ~105 and a similar reduction on another machine.
> (with a 1 warehouse database)
> Beyond that, the query seems to take too long. 10 rows will be returned and for each
district row (returned row) the MAX() values in the select list should just be a max lookup
in the primary key for the ORDERS and NEWORDERS table. The primary key for these tables is
W_ID, D_ID, O_ID.
> Thus it should be a scan of the 10 row DISTRICT table with 20 index lookups, hard to
see how that should table 100 seconds.
> One can see this by using the java-system-mini-codeline-jars target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message