lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: Synonym filter with support for phrases?
Date Wed, 22 Apr 2009 13:02:20 GMT

> Your synonyms will break if you try searching for phrases.

Good point, I did write that filter, but I never actually got to searching for 
exact phrases in it (there was a very specific scenario and we used prefix 
queries which worked quite well).

> Building on your example, "food place in new york" will find nothing,
> because 'place' and 'in' share the same position.

You're right, but is it such a big problem in real life? What you're describing 
is searching for a phrase that spawns both the synonym and the actual token 
sequence. What I thought was: searching for phrases that were either just 
synonyms or synonyms and text with an identical position layout (which is the 
case with single-word synonyms). I dare say this covers majority of cases, 
although I have nothing to support this claim.

> While building the index, I inject synonym group ids instead of actual
> words, then I detect synonyms in queries and replace them with group
> ids too. Hard part comes after that, you have to adjust
> positionIncrements on syngroup id tokens, with respect to the longest
 > [snip]

Yep, hairy ;)

> More correct approach is to index as-is and expand queries with actual
> synonym phrases instead of ids, but then queries become really
> humongous if you have any decent synonym dictionary (I have 20+ phrase
> groups).

Query expansion is not the option for me, unfortunately -- to many synonyms. It 
would be much better to do it once at indexing time and rely on this information 
since.

Thanks for sharing your thoughts, Кирилл.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message