db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Hore ...@araxis.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Mon, 14 Nov 2005 16:12:31 GMT
Hi Michael,

Thank you for yours suggestions. You've obviously very kindly given it
quite a bit a thought.

I'd already tried a bunch of the things that you have suggested.
Defining a composite key and removing the existing indexes did nothing
to improve performance. Re-writing using IN clause produced no
improvement (indeed, I'm pretty sure that this is what Derby is doing
anyway).

I ought to point out here that my comments on how the optimizer would
re-write my query is based on the Derby documentation found at
http://db.apache.org/derby/docs/10.1/tuning/rtuntransform582.html

Unfortunately (for me) this is merely one simple example of a whole
class of problems with different queries involving different tables in
the system I'm working on, and the problem is fundamentally that
multiple IN/WHERE terms aren't being handled sensibly, as summarized by
Satheesh Bandaram in his post. Derby is able to choose a sensible
strategy with just one term in the WHERE/IN clause -- it's just the case
of multiple terms where it chooses inappropriate strategies.

I will add my thoughts to DERBY-47 regarding this.

Thanks again,

Kevin Hore


Michael Segel wrote:
> On Friday 11 November 2005 08:59, Kevin Hore wrote:
> Hi,
> 
> Before complaining about the query optimizer, lets look at your design first.
> Yeah, I know that Derby, like every other database is not perfect and could 
> always use improvements. But the reality is that the first place to look for 
> improvements is in your own code.
> 
> Since you've got a small table and only 260K rows, lets try these simple 
> experiments to see if it will help.
> 
> 
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
> 
> Just a first suggestion, and its really a nit, could you alter the table to 
> have Word as the last column? 
> 
> 	CREATE TABLE tblSearchDictionary
> 	(
> 		 objectId		INT NOT NULL,
> 		 objectType 	INT NOT NULL,
> 		 wordLocation	INT NOT NULL,
> 		 word 		VARCHAR(64) NOT NULL,
> 		CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> 	);
> 
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
> 
> Ok first suggestion. Drop all of your Indexes.
> Based on your examples, these indexes are part of the problem.
> 
> For example an index on objectType doesn't make any sense.
> First, in your example, you don't even use objectType in any of your queries.
> Second. How many objectTypes are there? 10,100, 1000? The point is that 
> objectType is not an unique enough identifier to justify being its own index.
> The same could be said for wordLocation. So why have an index on this column?
> 
> 
> 
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
> 
> 
> Ok, since you believe that the Optimizer is having problems selecting the 
> correct Index to use, lets create your new indexes one by one to see how they 
> can improve performance.
> 
> Lets review your query statements:
> 
> 
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of around
>>200ms.
>>
> 
> Ok, this is pretty straight forward enough.
> 
> A good index would be a composite index on objectID and word.
> 
> 
>>Now consider the following query which adds a second search term on the
>>same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
> 
> 
> Ok, again if you create a composite index on objectID and Word, you will get 
> better performance.
> 
> As you point out below, the query may get rewritten.  If you wanted to, you 
> could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but 
> I'd save that for an option to test as a further tweak.
> 
> 
> So, as a test, do the following:
> 
> 1) DROP ALL CURRENT INDEXes.
> 2) CREATE a composite index on objectID and Word.
> 
> Run your query tests to see what happens to performance.
> 
> 3) To test your theory about the optimizer, rewrite your second query to use 
> the IN (xxx,xxx) option rather than OR.
> 
> In addition:
> 
> 1) Recreate your table with your VARCHAR column as the last column.
> 2) Alter your table and replace your VARCHAR column with a fixed length 
> CHAR(64) column.
> 
> If you can, please post your results.
> Do each change and see what happens to your query times.
> 
> HTH
> 
> -Gumby




Mime
View raw message