db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <Knut.Hat...@Sun.COM>
Subject Re: Bad performance with UPDATE on a nested SELECT
Date Tue, 13 Jan 2009 10:59:23 GMT
Mikkel Kamstrup Erlandsen <mke@statsbiblioteket.dk> writes:

> On Mon, 2009-01-12 at 13:49 +0100, Knut Anders Hatlen wrote:
>> Mikkel Kamstrup Erlandsen <mke@statsbiblioteket.dk> writes:
>> 
>> > Hi list,
>> >
>> > I am seeing some bad performance on an UPDATE on a nested SELECT. The
>> > query in case takes about 30s, but I think it should be a lot faster
>> > because the same query on a PostgresQL runs < 1 ms... Anyway, here's the
>> > setup:
>> >
>> > I have two tables 'records' and 'relations'.
>> >
>> > The 'records' table stores a bunch of records along with some metadata.
>> > It has a unique index 'i' on the 'id' column and a normal index 'm' on
>> > the  'mtime' column. This table holds about 1.5M rows.
>> >
>> > The 'relations' table stores parent/child relations between records and
>> > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on
>> > (parentId,childId) and a normal index 'c' on childId. This table holds
>> > about 35k rows.
>> 
>> Hi Mikkel,
>> 
>> Derby sometimes has problems with the cardinality statistics getting
>> stale in non-unique indexes, which may lead to bad decisions being made
>> by the optimizer. Do you still see the problem if you compress the
>> relations table (alternatively, you could drop the index on childId and
>> recreate it)? If this speeds up the execution, you're probably
>> experiencing these issues:
>> 
>> https://issues.apache.org/jira/browse/DERBY-269
>> https://issues.apache.org/jira/browse/DERBY-3788
>> 
>
> Hi Knut,
>
> I just tried compressing both the 'relations' and the 'records' table.
> This makes the query run in about 20s. So a small win, but nothing
> significant.

Right, that suggests that you're seeing a different problem. Just to be
sure that the old query plan isn't still lying around in the statement
cache, it might be worth checking if clearing the statement cache
changes anything:

  CALL SYSCS_UTIL.SYSCS_EMPTY_STATEMENT_CACHE()

> I have attached the query plan that I got when running with compressed
> indexes. To me it looks just like the first one i attached, but I don't
> speak fluent "query plan".

Yes, it looks like the same query plan. I see the following problems
with this query plan:

1) The IN clause is rewritten to a join with SUMMA_RECORDS as the outer
table, and we have no information to limit the scan of the outer table,
hence a full table scan is performed. It would probably have been better
if SUMMA_RECORDS were the inner table. Then the C index could be used to
enforce the restriction (childId='...') on the outer table,
SUMMA_RELATIONS, and index I could be used to perform lookups in the
inner table.

2) The index scan on SUMMA_RELATIONS uses the index PC. Since that index
is on the columns (parentId, childId), it is primarily sorted on
parentId, so the scan needs to go through the entire index, for each row
in the outer table, in order to match childId with the criterion in the
WHERE clause of the nested select. It would probably be better to use
the index on childId (index C) instead of PC (or reverse the order of
the columns in the PC index).

Derby allows you to override some of the optimizer's decisions, but it's
not as easy for UPDATE statements as it is for SELECT statements, so I
don't know how to tell it not to pick the table scan. Using the C index
instead of the PC index should be easy enough, though:

UPDATE summa_records SET base='my_base' WHERE id IN
  (SELECT parentId FROM summa_relations --DERBY-PROPERTIES index=C
      WHERE childId='horizon_2332668')

-- 
Knut Anders

Mime
View raw message