db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Segel" <mse...@segel.com>
Subject Bad table design leads to bad performance...RE: Bad performance with UPDATE on a nested SELECT
Date Tue, 13 Jan 2009 14:19:10 GMT


> -----Original Message-----
> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> Sent: Tuesday, January 13, 2009 4:59 AM
> To: Derby Discussion
> Subject: Re: Bad performance with UPDATE on a nested SELECT
[SNIP]
> Yes, it looks like the same query plan. I see the following problems
> with this query plan:
> 
> 1) The IN clause is rewritten to a join with SUMMA_RECORDS as the outer
> table, and we have no information to limit the scan of the outer table,
> hence a full table scan is performed. It would probably have been better
> if SUMMA_RECORDS were the inner table. Then the C index could be used to
> enforce the restriction (childId='...') on the outer table,
> SUMMA_RELATIONS, and index I could be used to perform lookups in the
> inner table.
> 
> 2) The index scan on SUMMA_RELATIONS uses the index PC. Since that index
> is on the columns (parentId, childId), it is primarily sorted on
> parentId, so the scan needs to go through the entire index, for each row
> in the outer table, in order to match childId with the criterion in the
> WHERE clause of the nested select. It would probably be better to use
> the index on childId (index C) instead of PC (or reverse the order of
> the columns in the PC index).
> 
> Derby allows you to override some of the optimizer's decisions, but it's
> not as easy for UPDATE statements as it is for SELECT statements, so I
> don't know how to tell it not to pick the table scan. Using the C index
> instead of the PC index should be easy enough, though:
> 
> UPDATE summa_records SET base='my_base' WHERE id IN
>   (SELECT parentId FROM summa_relations --DERBY-PROPERTIES index=C
>       WHERE childId='horizon_2332668')
> 
> --
> Knut Anders

I think that you're missing something.

First without knowing the whole application, why is there an index PC
instead of an Index P and then an index C? The relationship match table has
just two columns. So why do you need a compound index? It doesn't make
sense.

Will a parent have over 10,000 children? Then maybe it might make sense.
(Well not really. If that was the case, you'd probably want to use a temp
table. Assuming that Derby support the dynamic creation of temp tables like
Informix, but not like Oracle or DB2... )

I suggested that you redesign your tables so that instead of working with
varchars and indexing varchars all over the place, you instead create a
table so you can do a lookup of the varchar, matching to an integer ID. 

Indexing varchars in a main table is a *bad* idea. You end up with a lot of
fat indexes. Fat indexs don't perform as well as skinny indexes, especially
when you're talking about tables with millions plus rows.

Also do not use query optimization statements except as a last resort. 

Again I *highly* recommend creating a table where you assign a unique id to
a given 'parentId'/'id'/'childId' and then use that id in these tables
instead of the varchar. 1.5 million rows of an index on a varchar column
means a lot of wasted space and overhead. (Just to re-iterate the point made
above.)

Look, no disrespect... But a lot of times, improving your table design will
improve your performance and at the same time keeping your solution database
agnostic.

HTH

-Mike

But hey! What do I know? 
I'm not the guy who spent a billion dollars on MySQL when they could have
bought Informix for the same price almost a decade ago. ;-)



Mime
View raw message