lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao, Jing" <j...@webmd.net>
Subject protected phrases - possible?
Date Mon, 30 Mar 2015 17:48:16 GMT
Hi,

The way our collection is setup, searches for "breast cancer" are returning results for ovarian
cancer, or anything that contains either "breast" or "cancer".  The reason is, we are searching
across multiple fields.  Even though I have set a "mm" value so that if less than 3 terms,
ALL terms much match...SOLR considers it all matched even though "breast" was in the title
and "cancer" is in the description.

Is there a way to protect certain phrases so that they will not be tokenized?  I tried using
CommonGramsFilterFactory, but having "breast cancer" in the word list did not seem to do anything.
 I'm guessing it's because the field is tokenized first, so nothing would match that phrase.
 If I put "breast" and "cancer" as separate entries in the word list, I end up with too many
unnecessary shingles, and "breast" and "cancer" are still two of the final terms.

I have a feeling CommonGramsFilterFactory is not the right way to handle this.  What are other
options?  Is it better to put all fields in one field, apply mm, and proximity boost?

Thanks!
Jing

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message