db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings
Date Wed, 28 Oct 2009 04:29:59 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bryan Pendleton updated DERBY-4422:
-----------------------------------

    Attachment: goodqp.txt
                badqp.txt

Tony, thanks for posting all the information; it is very helpful!

I took a quick look at the query plans in the derby.log which you posted.

The query plan for CONCEPT_ID=1 is *dramatically* different than the query plan for NL_LC='dash'.

In both cases, the overall structure of the query plan is to perform an "exists" join in which
the outer table is OC_CONCEPTS and the inner table in OC_CONCEPT_STRINGS.

But in the CONCEPT_ID=1 case, Derby chooses a HASH EXISTS JOIN, with a
HASH SCAN RESULT SET used for the OC_CONCEPT_STRINGS inner table.

While in the NL_LC='dash' case, Derby choose a NESTED LOOP EXISTS JOIN, with
a RESTRICTION node wrapped around an INDEX ROW TO BASE ROW result set.

In either case, Derby processes all of the 150,050 rows in OC_CONCEPTS by scanning that table,
then checks each row against OC_CONCEPT_STRINGS to see if it qualifies.

But in the HASH case, that check against the inner table is extremely efficient, while in
the index-lookup case, the check is dramatically more expensive.

I attached the two query plan sections from derby.log which I studied as 'badqp.txt' and 'goodqp.txt'.

I don't know why Derby picks such a dreadful query plan in the one case, and such a
superior plan in the other case.

It's interesting that *neither* plan matches the plan that you expected: namely, to
perform the entire non-correlated inner DISTINCT query first, then use those results
to probe into the outer OC_CONCEPTS table. I don't know whether Derby considered
that plan and decided not to choose it, or in fact didn't even consider it at all. I think
that the effectiveness of that plan depends considerably on the observation that in
both the CONCEPT_ID =1 case and the NL_LC='dash' case there are only a handful
of rows in OC_CONCEPT_STRINGS which match that restriction; I'm not sure if
Derby knows that or not.

Hopefully some other people will chime in with their own observations on the
query plans that you have posted; I find them quite interesting.

Lastly, I think it may be possible that you could use optimizer overrides (--DERBY-PROPERTIES)
to control the optimizer's query plan choice, but I'm not an expert on these overrides
and hope that somebody else can comment on whether they could be used here or not.
Here's some doc on the --DERBY-PROPERTIES feature:
http://db.apache.org/derby/docs/10.5/tuning/ctunoptimzoverride.html


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz,
SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE
OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key
columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in
the sub query, it takes over 6 seconds to run, even though both queries independently take
< 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE
OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message