db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <de...@segel.com>
Subject RE: Bad performance with UPDATE on a nested SELECT
Date Mon, 12 Jan 2009 23:12:56 GMT
Sorry to cut the message down...

In a different post, when you've updated the statistics (rebuilt the index),
you saw your time go from 30 seconds down to 20 seconds.

When I asked you to cut the amount of data from 1.5m records to a smaller
number, the time reduction seems to be linear with respect to the number of
records.

This is a good indication that your query (below) is performing a sequential
scan on the outer table (1.5 million records).

The fact that there is a performance problem because you have variables in
the inner select instead of hard coded values, is an indication of an issue
with Derby.

There is another thing I would like you to try.

You're using id, parent_id, child_id as VARCHARS. This could part of the
problem too.

What I suggest that you do is create a table, lets call it foo which will
represent an integer id and an associated id_name (varchar).

I'm going to assume that your parent_id and child_id are both the same data
types where one record's child_id could be another record's parent_id and of
course you would have to change the data type of id in the records table.

And you would have an identity index on id and maybe a uniqueness id on
id_name. (I'm not sure if you need this or not.)

Then your relations table would be the same, only parentId and childId would
be integers. So you'd have a map of 3 integer values and the indexing could
be the same.

Then rerun your query and see what happens to the performance.

My gut feel is that you will see much better performance, and that by doing
this exercise you might be able to pinpoint the area of the bug in derby.


I hope this makes sense.

-Mike

> > > The 'records' table stores a bunch of records along with some
> metadata.
> > > It has a unique index 'i' on the 'id' column and a normal index 'm' on
> > > the  'mtime' column. This table holds about 1.5M rows.
> > >
> > > The 'relations' table stores parent/child relations between records
> and
> > > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on
> > > (parentId,childId) and a normal index 'c' on childId. This table holds
> > > about 35k rows.
> > >
> > > The problematic query looks like:
> > >
> > > UPDATE records
> > > SET base='my_base'
> > > WHERE id IN (
> > >   SELECT parentId
> > >   FROM relations
> > >   WHERE childId='id_1');
> > >
> > > The right SELECT returns two rows (and in general 0-3 or something
> like
> > > that). If I instead of the SELECT hardcode two ids, making the right
> > > part look like "IN ('id_2', 'id_3')" the query runs in < 1ms.

> I just tried on a base half the size (~700k rows) and it runs in about
> 5-10s. The really creepy about this case is that my test run was a case
> where the nested SELECT was empty!..




Mime
View raw message