db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikkel Kamstrup Erlandsen <...@statsbiblioteket.dk>
Subject RE: Bad performance with UPDATE on a nested SELECT
Date Mon, 12 Jan 2009 21:54:41 GMT
Hi Mike,

> What data type are you using for 'child_id' and 'parent_id'?
> It seems that the way you've written the query that they are character
> strings and not integers.
> (Not that it would cause the delay.)

They are VARCHAR(255).

> If you run the inner select by itself as written, how long does it take?

About 1ms.

> There was a bug several releases ago concerning the IN clause performance,
> this could be related.

This problem is on 10.4.2

> It would not be a good thing if you are doing a sequential scan on the
> entire 1.5 mil rows, which is what you could be seeing in the 30 sec run.
> 
> Another test that you can use to confirm the problem is to create a subset
> of the records table with 10,000 rows. If the same query runs with a low
> (sub second) response, then you've verified that you're doing a table scan.

I just tried on a base half the size (~700k rows) and it runs in about
5-10s. The really creepy about this case is that my test run was a case
where the nested SELECT was empty!..

> Assuming that this is a bug, and assuming that your id's are character
> strings and not integers, try creating them as integers and see if that
> helps the IN clause performance. (It's a gut feel, a hunch). 

This will be a substantial amount of work and will not be acceptable as
an end solution, so I will prefer not going down that path...


Cheers,
Mikkel

> > -----Original Message-----
> > From: Mikkel Kamstrup Erlandsen [mailto:mke@statsbiblioteket.dk]
> > Sent: Monday, January 12, 2009 6:00 AM
> > To: derby-user@db.apache.org
> > Subject: Bad performance with UPDATE on a nested SELECT
> > 
> > Hi list,
> > 
> > I am seeing some bad performance on an UPDATE on a nested SELECT. The
> > query in case takes about 30s, but I think it should be a lot faster
> > because the same query on a PostgresQL runs < 1 ms... Anyway, here's the
> > setup:
> > 
> > I have two tables 'records' and 'relations'.
> > 
> > The 'records' table stores a bunch of records along with some metadata.
> > It has a unique index 'i' on the 'id' column and a normal index 'm' on
> > the  'mtime' column. This table holds about 1.5M rows.
> > 
> > The 'relations' table stores parent/child relations between records and
> > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on
> > (parentId,childId) and a normal index 'c' on childId. This table holds
> > about 35k rows.
> > 
> > The problematic query looks like:
> > 
> > UPDATE records
> > SET base='my_base'
> > WHERE id IN (
> >   SELECT parentId
> >   FROM relations
> >   WHERE childId='id_1');
> > 
> > The right SELECT returns two rows (and in general 0-3 or something like
> > that). If I instead of the SELECT hardcode two ids, making the right
> > part look like "IN ('id_2', 'id_3')" the query runs in < 1ms.
> > 
> > It would appear that Derby iterates over the entire 'records' table or
> > something like that. This also appears to be the case as far as I can
> > read from the attached query plan...
> > 
> > Any pointers or ideas on how to tackle this are most appreciated.
> > 
> > Cheers,
> > Mikkel
> 
> 


Mime
View raw message