lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Hotm. Nørregaard" <>
Subject Re: How to include a multi-word synonym to a word when indexing?
Date Tue, 12 Apr 2005 12:19:22 GMT
Good point on phrase/span queries, Hostetter.

:Assuming you have the following "phrase synonym" (and code that
:that can find them during Analysis)...
:  [CyberCafe] => [Cyber] [Cafe]

: the only thing that's ever occured to me is to set the position incriment
: of all the words to "0" (but that will still reseult in false positives in
: the "cyber cafe" example) or to pick some high default position incriment
: (bigger then the longest multi-word synonym) and use that normally, and
: reserve incriments of "1" for words in a multi-word synonym.

A good suggestion, however it does have a small side-effect: If I understand 
you correctly, that strategy will create the following token stream for 
"CyberCafe Inc.", assuming that we increment by, say, 10 per default:
[cybercafe, 1] [cyber, 1] [cafe, 2] [inc, 10]

In that case, a search for the phrase "cybercafe cafe inc" would return a 
match. In this case it is acceptable albeit a bit strange to the user, but 
then again, searching for "cybercafe cafe" IS a bit strange. However, 
situations can be constructed where the result would be a false positive. 
Also, we could end up with no match for phrase queries if the slop-factor is 
too low (e.g. 0): "Cybercafe inc" would not be found unless the same 
analyse-algorithm also is applied to both the document and the query,
And ranking could also be aversely affected.

There is no such concept as a 2-dimensional term vector?
[CyberCafe Inc] => [[cybercafe], [[cyber] [café]]] [inc]
(in theory it would have to be a directed, acyclic graph (DAG), I guess)

Del din verden med MSN Spaces

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message