lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: Handling Synonyms
Date Mon, 21 Feb 2005 20:19:17 GMT
Luke Shannon wrote:

> Hello;
> Does anyone see a problem with the following approach?

No, no problem with it and it's in fact what my "Wordnet Query 
Expansion" sandbox module does.

The nice thing about Lucene is you at least have the option of doing 
things the other way - you can write a custom Analyzer that puts all 
synonyms at the same token offset so they appear to be in the same place 
in the token stream. Thinking about it...this approach, with the 
Analyzer, lets user search for phrases which would match a synonym, so, 
using your example below, the text "bright red engine" would be matched 
by either phrase "bright red" or "bright colour". Doing the query 
expansion is trickier if you allow phrases.

> For synonyms, rather than putting them in the index, I put the original term
> and all the synonyms in the query.
> Every time I create a query, I check if the term has any synonyms. If it
> does, I create Boolean Query OR'ing one Query object for each synonym.
> So if I have a synoym list:
> red = colour, primary, stop
> And someone wants to search the desc field for the red, I would end up with
> something like:
> ( (desc:*red*) (desc:*colout*) (desc:*stop*) ).

I don't like that bit about substring terms, but if it's right for you 
ok - if you insist on loosening things I'd consider fuzzy terms 
(desc:red~ ...etc).

> Now the synonyms would'nt be in the index, the Query would account for all
> the possible synonym terms.
> Luke
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message