db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <de...@segel.com>
Subject RE: Bad performance with UPDATE on a nested SELECT
Date Mon, 12 Jan 2009 12:57:44 GMT
Just a quick question...

What data type are you using for 'child_id' and 'parent_id'?
It seems that the way you've written the query that they are character
strings and not integers.
(Not that it would cause the delay.)

If you run the inner select by itself as written, how long does it take?

There was a bug several releases ago concerning the IN clause performance,
this could be related.

It would not be a good thing if you are doing a sequential scan on the
entire 1.5 mil rows, which is what you could be seeing in the 30 sec run.

Another test that you can use to confirm the problem is to create a subset
of the records table with 10,000 rows. If the same query runs with a low
(sub second) response, then you've verified that you're doing a table scan.

Assuming that this is a bug, and assuming that your id's are character
strings and not integers, try creating them as integers and see if that
helps the IN clause performance. (It's a gut feel, a hunch). 


> -----Original Message-----
> From: Mikkel Kamstrup Erlandsen [mailto:mke@statsbiblioteket.dk]
> Sent: Monday, January 12, 2009 6:00 AM
> To: derby-user@db.apache.org
> Subject: Bad performance with UPDATE on a nested SELECT
> Hi list,
> I am seeing some bad performance on an UPDATE on a nested SELECT. The
> query in case takes about 30s, but I think it should be a lot faster
> because the same query on a PostgresQL runs < 1 ms... Anyway, here's the
> setup:
> I have two tables 'records' and 'relations'.
> The 'records' table stores a bunch of records along with some metadata.
> It has a unique index 'i' on the 'id' column and a normal index 'm' on
> the  'mtime' column. This table holds about 1.5M rows.
> The 'relations' table stores parent/child relations between records and
> has two rows 'parentId' and 'childId'. There is a unique index 'pc' on
> (parentId,childId) and a normal index 'c' on childId. This table holds
> about 35k rows.
> The problematic query looks like:
> UPDATE records
> SET base='my_base'
> WHERE id IN (
>   SELECT parentId
>   FROM relations
>   WHERE childId='id_1');
> The right SELECT returns two rows (and in general 0-3 or something like
> that). If I instead of the SELECT hardcode two ids, making the right
> part look like "IN ('id_2', 'id_3')" the query runs in < 1ms.
> It would appear that Derby iterates over the entire 'records' table or
> something like that. This also appears to be the case as far as I can
> read from the attached query plan...
> Any pointers or ideas on how to tackle this are most appreciated.
> Cheers,
> Mikkel

View raw message