db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tony Brusseau (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings
Date Tue, 27 Oct 2009 22:30:59 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643
] 

Tony Brusseau edited comment on DERBY-4422 at 10/27/09 10:29 PM:
-----------------------------------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs
extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))


the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside
out which should make the performance between the two schemes above equivalent, however, I'm
experiencing ~4 orders of magnitude difference in performance.


      was (Author: apb):
    When the problematic query is separated into 2 separate queries, each separate query runs
extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))


the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside
out which should make the performance between the two schemes above equivalent, however, I'm
experiencing ~4 order of magnitude difference in performance.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz,
SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE
OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key
columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in
the sub query, it takes over 6 seconds to run, even though both queries independently take
< 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE
OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message