db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <mse...@segel.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Sat, 12 Nov 2005 16:28:49 GMT
On Friday 11 November 2005 08:59, Kevin Hore wrote:

Before complaining about the query optimizer, lets look at your design first.
Yeah, I know that Derby, like every other database is not perfect and could 
always use improvements. But the reality is that the first place to look for 
improvements is in your own code.

Since you've got a small table and only 260K rows, lets try these simple 
experiments to see if it will help.

> Consider the table:
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
Just a first suggestion, and its really a nit, could you alter the table to 
have Word as the last column? 

	CREATE TABLE tblSearchDictionary
		 objectId		INT NOT NULL,
		 objectType 	INT NOT NULL,
		 wordLocation	INT NOT NULL,
		 word 		VARCHAR(64) NOT NULL,
		CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
Ok first suggestion. Drop all of your Indexes.
Based on your examples, these indexes are part of the problem.

For example an index on objectType doesn't make any sense.
First, in your example, you don't even use objectType in any of your queries.
Second. How many objectTypes are there? 10,100, 1000? The point is that 
objectType is not an unique enough identifier to justify being its own index.
The same could be said for wordLocation. So why have an index on this column?

> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);

Ok, since you believe that the Optimizer is having problems selecting the 
correct Index to use, lets create your new indexes one by one to see how they 
can improve performance.

Lets review your query statements:

> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> GROUP BY ObjectId;
> On my machine this will usually complete in an acceptable time of around
> 200ms.
Ok, this is pretty straight forward enough.

A good index would be a composite index on objectID and word.

> Now consider the following query which adds a second search term on the
> same column.
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;

Ok, again if you create a composite index on objectID and Word, you will get 
better performance.

As you point out below, the query may get rewritten.  If you wanted to, you 
could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but 
I'd save that for an option to test as a further tweak.

So, as a test, do the following:

2) CREATE a composite index on objectID and Word.

Run your query tests to see what happens to performance.

3) To test your theory about the optimizer, rewrite your second query to use 
the IN (xxx,xxx) option rather than OR.

In addition:

1) Recreate your table with your VARCHAR column as the last column.
2) Alter your table and replace your VARCHAR column with a fixed length 
CHAR(64) column.

If you can, please post your results.
Do each change and see what happens to your query times.


Michael Segel

View raw message