db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4007) Optimization of IN with nested SELECT
Date Fri, 03 Jul 2009 13:14:47 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-4007:

    Attachment: CreateDatabase4007.java

Attached is a java class (CreateDatabase4007.java) which creates a database with the schema
Mikkel posted and populates it with about the same number of rows as he reported.

When I try the update statement I see that it takes ~19 seconds.

The time is reduced to ~4.5 seconds if I create an extra index:

    create unique index cp on summa_relations(childid, parentid);

This index is more efficient than the existing PC index when there's a restriction on childid.
However, the statement still performs a full scan of SUMMA_RECORDS, which sounds sub-optimal.

> Optimization of IN with nested SELECT
> -------------------------------------
>                 Key: DERBY-4007
>                 URL: https://issues.apache.org/jira/browse/DERBY-4007
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>         Environment: Linux
>            Reporter: Mikkel Kamstrup Erlandsen
>            Priority: Minor
>         Attachments: CreateDatabase4007.java, dblook.log, dblook_p_index.log, derby.log,
> The problem is with the following query:
> UPDATE summa_records SET base='foobar' WHERE id IN ( SELECT parentId FROM summa_relations
WHERE childId='horizon_2615441');
> It takes in the order of 30s to run when we expect something in the order of 1-2ms.
> We have a setup with two tables
> summa_records:  1,5M rows
> summa_relations: ~350000 rows
> summa_records have and 'id' column that is also indexed and is the primary key. The summa_relations
table holds mappings between different ids.
> In our case the nested SELECT produces 2 hits, say, 'foo' and 'bar'. So the UPDATE on
these two hits should be quite snappy. If we run the SELECT alone it runs in an instant, and
also if we run with hardcoded ids for the IN clause:
> UPDATE summa_records SET base='foobar' WHERE id IN ('foo', 'bar');
> We have instant execution. I'll attach a query plan in a sec.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message