lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donna L Gresh <gr...@us.ibm.com>
Subject Is there a way for me to handle a multiword synonym correctly?
Date Fri, 07 Aug 2009 14:35:10 GMT
I saw some discussion on the board but I'm not sure I've got quite the 
same problem. As an example, I have a query that might be a technical 
skill:

SAP EM FIN AM

I would like that to match a document that has *either* SAP.EM.FIN.AM or 
"SAP EM FIN AM" (in that order and all together, not spread out through 
the document).

The approach I had tried was at index time if I saw SAP.EM.FIN.AM I would 
consider "SAP EM FIN AM" a synonym for it, using the Lucene in Action 
example. Luke shows me that I have two terms in the index for this 
document: SAP.EM.FIN.AM and "SAP EM FIN AM" (one term). Thus it appears 
differently in the index than if it had been organically found as just the 
string of tokens, in which case there would be separate terms for SAP, EM, 
and so on. 

At query time if I look for "SAP EM FIN AM" it is formed as a phrase query 
with a slop of 0 which does *not* match the one term version "SAP EM FIN 
AM". (For that matter a simple boolean query doesn't find it either) Luke 
confirms the fact that the phrase query does not find my synonym term. The 
query "SAP EM FIN AM" finds *only* documents that originally had those 
separated tokens in them.

Is there a way to handle this situation such that at index time I can turn 
SAP.EM.FIN.AM into something that will be found with a query for "SAP EM 
FIN AM"?

Thanks for any pointers

Donna 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message