db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <mse...@segel.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Mon, 14 Nov 2005 14:19:29 GMT
On Saturday 12 November 2005 16:04, Michael Segel wrote:
> On Saturday 12 November 2005 10:28, Michael Segel wrote:
> > 	CREATE TABLE tblSearchDictionary
> > 	(
> > 		 objectId		INT NOT NULL,
> > 		 objectType 	INT NOT NULL,
> > 		 wordLocation	INT NOT NULL,
> > 		 word 		VARCHAR(64) NOT NULL,
> > 		CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> > 	);
> >
> > > This table has an index on each of the four columns, it also has the
> > > unique index across all four columns as defined above:
>
> As a follow up...
> Change the constraint order to UNIQUE(objectId, word, wordLocation,
> objectType);
>
> You probably can drop the index I asked you to create if you make this mod.
> Then your query should use the unique constraint's "backing index"...
>
> HTH.
>
> -G

If you want to make this a little bit more efficient, you could try the 
following:

1) Create two tables. 

CREATE TABLE wordKey
(
	wordID	INT  NOT NULL GENERATED ALWAYS AS IDENTITY,
	word		VARCHAR(128) NOT NULL,
	CONSTRAINT	wordKeyCONS01 UNIQUE (wordID, word)
);
// Note this should create the backing INDEX so you don't need to create an 
INDEX

	CREATE TABLE tblSearchDictionary
  	(
  		 objectId		INT NOT NULL,
 		 objectType 	INT NOT NULL,
 		 wordLocation	INT NOT NULL,
  		 wordID 		INT NOT NULL,
		CONSTRAINT CONSd0e222 UNIQUE (objectId,wordID, wordLocation, objectType)
 	);
// Note: I didn't see the SQL Syntax for creating a foreign key on wordID but 
you'll want to do that and to also for cascading deletes, even though you may 
not ever delete a word from wordKey...

It looks like you're trying to build your own word search index of documents.
If you you'll want to automate it rather than use static queries.

So lets say you're getting a text input of "CONSTRAINT  OR BAR AND FOO".
I would interpret this to mean select documents that either contain the word 
"CONSTRAINT" or the words "FOO" AND "BAR".
Note that you're going to create a parse tree that says node one contains 1 
item, "CONSTRAINT", node two contains 2 items, "FOO", "BAR"...

So you would want to create a SELECT statement that is equivlent to:

SELECT objectID, SUM(wordLocation) SCORE 
FROM	tblSearchDictionary
WHERE  wordID = xxx
UNION (
	SELECT objectID, SUM(wordLocation) SCORE
	FROM tblSearchDictionary
	WHERE wordID IN (yyy,zzz)
)...
GROUP BY objectID

Where xxx, yyy, zzz are integer values that you got from your look up.

While this isn't the complete solution, it should point you in the correct 
direction and should work independent of some kludges in the optimizer.

Your wordKey look up statement would be a prepared statement, while your 
actual scoring routine would be a dynamic statement.  

This should give you what you want.

I believe that there may be more to this for example what happens if you have  
a document that contains all three words ....

I'd also recommend the lookup table as a way to decrease the size of your 
overall solution and it may change how your optimizer looks at things. 
(You're doing your main query all on ints...)

But hey, what do I know? I am just having my first cup of joe and you get what 
you paid for... ;-)

-G

-- 
Michael Segel
Principal 
MSCC

Mime
View raw message