lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: How to include a multi-word synonym to a word when indexing?
Date Tue, 12 Apr 2005 13:36:55 GMT
On Apr 12, 2005, at 1:42 AM, Chris Hostetter wrote:
> : You'll need some kind of lookup to know how to split a token like
> : "cybercafe" into two words - once you've done that it will be easy to
> : set the position increment of them to zero so that they overlay the
> : original term.
> but how would you set the position increment of a multi-word synonym so
> that phrase/span queries will work?
> Assuming you have the following "phrase synonym" (and code that
> that can find them during Analysis)...
>    [CyberCafe] => [Cyber] [Cafe]
>    [IBM] => [International] [Business] [Machines]
>    [Cyber] [Cafe] => [CyberCafe]
>    [International] [Business] [Machines] => [IBM]
> and the source documents:
> 1) bob bought stock in IBM for five bucks
> 2) sue went to the cybercafe yesterday
> 3) joe was at the cafe, cyber chating yesterday
> would you set the position incriment so that a span/phrase query
> for "stock in International Business Machines" would match document #1,
> and "cyber cafe" would match document #2 but not #3 ?

On further thought, my approach would be to handle this on the analysis 
side and not deal with position increments.  The lookup would take 
"cyber cafe" and emit the token "cybercafe".  In your #3 example, the 
tokens would be [cafe] [cyber] and would not match.  If someone issued 
a phrase query for "cyber cafe" the same analysis would turn that into 
a query for "cybercafe".

What drawbacks are there from replacing multiple words with its 
corresponding acryonym/alias during analysis?

> the only thing that's ever occured to me is to set the position 
> incriment

I can't help myself, I'm working with the spell checker as we speak.... 
incrEment :)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message