lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Best way to do Query inflation?
Date Tue, 11 Mar 2008 19:09:51 GMT
I don't have the source code at hand, but have a look at Solr, as it has support for synonyms
(and you could treat those extra terms as synonyms, it seems).  You don't have to switch from
Lucene to Solr if you don't want to, of course, you could simply look how Solr does it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Itamar Syn-Hershko <itamar@divrei-tora.com>
To: java-user@lucene.apache.org
Sent: Monday, March 10, 2008 12:09:50 PM
Subject: Best way to do Query inflation?

Hi all,

I'm looking for the best way to inflate a query, so a query like: "synchronous AND colour"
-- will become something like this:

"(synchronous OR asynchronous OR bsynchornous OR synchronos OR asynchronos OR bsynchornos)
AND (colour OR acolour OR bcolour OR color OR acolor OR bcolor)".

I'm doing two-fold action - creating another instance of the word without a specific letter(s),
and then adding initial letters to all the resulting set of words. The resulting list of words
should be OR'ed and replace the original word from the original query (so "colour" becomes
"(colour OR acolour OR bcolour OR color OR acolor OR bcolor)"). This is for making Hebrew
queries return more precise results, don't try to find the logic in English :)

What I'm looking for is the proper way and place to make the actual query expanding process.
Keep in mind I will want to be able to access the complete list of words later so I could
use them to highligh results in the opened document. If I could access the terms object directly
somehow and replace each item with and OR'd object that would be ideal I think.

BTW, I have already established that I will have to write my own query parser derived from
QueryParserBase directly, and my own Lexer, since Hebrew has some unique stuff that sometimes
confuse Lucene's default. The tweaks I'm about to make should work for both Hebrew AND English,
so perhaps will become the standard way of doing things...

Thanks in advance for any help,

Itamar.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message