lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: protected phrases - possible?
Date Mon, 30 Mar 2015 18:26:22 GMT
Hi Jing,

You can boost phrases by pf (phrase fields) parameter. If you don't like this solution, you
can modify search query at client side. E.g. surround certain phrases with quotes. This will
force proximity search without interfering with tokenisation.

Ahmet


On Monday, March 30, 2015 8:49 PM, "Tao, Jing" <jtao@webmd.net> wrote:
Hi,

The way our collection is setup, searches for "breast cancer" are returning results for ovarian
cancer, or anything that contains either "breast" or "cancer".  The reason is, we are searching
across multiple fields.  Even though I have set a "mm" value so that if less than 3 terms,
ALL terms much match...SOLR considers it all matched even though "breast" was in the title
and "cancer" is in the description.

Is there a way to protect certain phrases so that they will not be tokenized?  I tried using
CommonGramsFilterFactory, but having "breast cancer" in the word list did not seem to do anything.
 I'm guessing it's because the field is tokenized first, so nothing would match that phrase.
 If I put "breast" and "cancer" as separate entries in the word list, I end up with too many
unnecessary shingles, and "breast" and "cancer" are still two of the final terms.

I have a feeling CommonGramsFilterFactory is not the right way to handle this.  What are other
options?  Is it better to put all fields in one field, apply mm, and proximity boost?

Thanks!
Jing 

Mime
View raw message