lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mukhopadhyay, Aratrika" <>
Subject RE: Relevancy Tuning For Solr With Apache Nutch 2.3
Date Thu, 08 Feb 2018 13:40:16 GMT
Thank you Charlie. This has been very helpful. The reason one boost value is 2.0 while the
other is 0.03 is simply because I wasn't sure if the boost I was applying the first place
may have been too "gentle". I will start by disabling the boost from nutch's  side and install
quepid as per your suggestion. 


-----Original Message-----
From: Charlie Hull [] 
Sent: Thursday, February 08, 2018 4:09 AM
Subject: Re: Relevancy Tuning For Solr With Apache Nutch 2.3

On 07/02/2018 21:59, Mukhopadhyay, Aratrika wrote:
> Hello ,
>           I am attempting to tune my results that I retrieve from solr to boost the importance
of certain fields. The syntax of the query I am using is as follows :
> http://localhost:8983/solr/housegov_data/select?indent=on&q=QUERY&defType=edismax&qf=FIELD1^20.0_FIELD2^0.03&wt=json<http://localhost:8983/solr/housegov_data/select?indent=on&q=QUERY&defType=edismax&qf=FIELD1%5e20.0_FIELD2%5e0.03&wt=json>.
The issue is that this is not boosting anything in most cases or it isn't being able to find
any documents that match this criteria. I have used nutch to crawl websites and indexed the
data to solr. I see that nutch applies an index time boost as well. Could that have something
to do with this ? Can anyone look at the format of this query and enlighten me of any mistakes
that I am making.


- You seem to have two field incorrectly concatenated with an
underscore: qf=FIELD1^20.0_FIELD2^0.03 - this should be a space or an encoded space
- a large boost of 20 combined with a fractional boost of 0.03 worries me as it implies that
one field is 666 times as important as another, are you sure this is the case?
- you should turn off *all* the boosts, including the Nutch one, and start again, *gently*
applying boosts where you can *prove* they improve relevancy
- you should consider using a tool such as Quepid (disclaimer: we resell this, but there's
a free trial period you can use) for relevancy tuning based on a set of test cases



> FYI : I am using a data driven schema.
> Regards,
> Aratrika Mukhopadhyay

Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828

View raw message