db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Romi Ou (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4007) Optimization of IN with nested SELECT
Date Thu, 23 Apr 2009 06:47:47 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701825#action_12701825
] 

Romi Ou commented on DERBY-4007:
--------------------------------

I ran into this issue too recently.
Here is my table schema:
create table TRADE_HISTORY (SEQID bigint not null unique, TRADEID varchar(16) not null, VERSION
integer, STREAM varchar(20), TIMESTAMP varchar(31) not null, STATE char(1) default 'V', COMMENT
varchar(512), CONTENT varchar(512), unique(TRADEID,DATASTREAMREF));

I added 50k records into the table. The SEQID of the records are from 0 to 49999. 
Updated the STATE of 10k records:
update TRADE_HISTORY set STATE='E' where seqid >=40000 and seqid < 50000;

Then I query the table with below sql:
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select max(SEQID)
from TRADE_HISTORY where seqid>=
40000 and seqid<40100 group by TRADEID) ;
1
-----------
100

1 row selected
ELAPSED TIME = 2375 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select max(SEQID)
from TRADE_HISTORY where seqid>=
40000 and seqid<40200 group by TRADEID) ;
1
-----------
200

1 row selected
ELAPSED TIME = 9875 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select max(SEQID)
from TRADE_HISTORY where seqid>=
40000 and seqid<40400 group by TRADEID) ;
1
-----------
400

1 row selected
ELAPSED TIME = 38812 milliseconds

After optimizing the query sql, the performance is much better. 
ij> select count(*) from (select * from TRADE_HISTORY where ( STATE= 'E' ) as errorTabl
e, (select max(SEQID) maxSeqID from TRADE_HISTORY where seqid>=40000 and seqid<40400
group by TRADEID) as tempT where er
rorTable.SEQID =tempT.maxSeqID ;
1
-----------
400

1 row selected
ELAPSED TIME = 47 milliseconds

By referring to Tuning Derby Doc (http://db.apache.org/derby/docs/10.4/tuning/tuning-single.html#rtuntransform582),
I noticed that there is a "Simple IN predicate transformations" in Derby. 
I'm not quite sure about the root cause of this issue.
Is it estimated cost or the sql is transferred to another sql?


> Optimization of IN with nested SELECT
> -------------------------------------
>
>                 Key: DERBY-4007
>                 URL: https://issues.apache.org/jira/browse/DERBY-4007
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 10.4.2.0
>         Environment: Linux
>            Reporter: Mikkel Kamstrup Erlandsen
>            Priority: Minor
>         Attachments: dblook.log, dblook_p_index.log, derby.log, derby_p_index.log
>
>
> The problem is with the following query:
> UPDATE summa_records SET base='foobar' WHERE id IN ( SELECT parentId FROM summa_relations
WHERE childId='horizon_2615441');
> It takes in the order of 30s to run when we expect something in the order of 1-2ms.
> We have a setup with two tables
> summa_records:  1,5M rows
> summa_relations: ~350000 rows
> summa_records have and 'id' column that is also indexed and is the primary key. The summa_relations
table holds mappings between different ids.
> In our case the nested SELECT produces 2 hits, say, 'foo' and 'bar'. So the UPDATE on
these two hits should be quite snappy. If we run the SELECT alone it runs in an instant, and
also if we run with hardcoded ids for the IN clause:
> UPDATE summa_records SET base='foobar' WHERE id IN ('foo', 'bar');
> We have instant execution. I'll attach a query plan in a sec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message