lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: How to include a multi-word synonym to a word when indexing?
Date Tue, 12 Apr 2005 05:42:51 GMT

: You'll need some kind of lookup to know how to split a token like
: "cybercafe" into two words - once you've done that it will be easy to
: set the position increment of them to zero so that they overlay the
: original term.

but how would you set the position increment of a multi-word synonym so
that phrase/span queries will work?

Assuming you have the following "phrase synonym" (and code that
that can find them during Analysis)...

   [CyberCafe] => [Cyber] [Cafe]
   [IBM] => [International] [Business] [Machines]
   [Cyber] [Cafe] => [CyberCafe]
   [International] [Business] [Machines] => [IBM]

and the source documents:

1) bob bought stock in IBM for five bucks
2) sue went to the cybercafe yesterday
3) joe was at the cafe, cyber chating yesterday

...how would you set the position incriment so that a span/phrase query
for "stock in International Business Machines" would match document #1,
and "cyber cafe" would match document #2 but not #3 ?

the only thing that's ever occured to me is to set the position incriment
of all the words to "0" (but that will still reseult in false positives in
the "cyber cafe" example) or to pick some high default position incriment
(bigger then the longest multi-word synonym) and use that normally, and
reserve incriments of "1" for words in a multi-word synonym.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message