lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Boosting query results
Date Thu, 07 Jul 2016 18:34:20 GMT
If it is running in an environment protected from spammers, you might want to start with the
work that LucidWorks did on click scoring.

https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/
<https://lucidworks.com/blog/2015/03/23/mixed-signals-using-lucidworks-fusions-signals-api/>

Of course, there are no environments free of spammers. I’ve seen them in enterprise search,
too. But they are easier to deal with there. Call them up and tell them they need to stop
immediately or their pages disappear from the search engine.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2016, at 11:29 AM, Walter Underwood <wunder@wunderwood.org> wrote:
> 
> You understand that you are making your site extremely easy to spam, right? This is how
Microsoft became the top hit for “evil empire” on Google.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jul 7, 2016, at 11:25 AM, Mark T. Trembley <mark.trembley@etrailer.com>
wrote:
>> 
>> I've found that it is definitely complicated!
>> 
>> Essentially what I am attempting to do is boost products based on the number of times
that particular product has been selected via historical searches using the same search term
or phrase.
>> 
>> 
>> On 7/7/2016 11:55 AM, Walter Underwood wrote:
>>> That is a very complicated design. What are you trying to achieve? Maybe there
is a different approach that is simpler.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Jul 7, 2016, at 9:26 AM, Mark T. Trembley <mark.trembley@etrailer.com>
wrote:
>>>> 
>>>> That works with static boosts based on documents matching the query "Boost2".
I want to apply a different boost to documents based on the value assigned to Boost2 within
the document.
>>>> 
>>>> From my sample documents, when running a query with "Boost2," I want Document2
boosted by 20.0 and Document6 boosted by 15.0:
>>>> 
>>>> {
>>>>  "id" : "Document2_Boost2",
>>>>  "B1_s" : "Boost2",
>>>>  "B1_f" : 20
>>>> }
>>>> {
>>>>  "id" : "Document6_Boost2",
>>>>  "B1_s" : "Boost2",
>>>>  "B1_f" : 15
>>>> }
>>>> 
>>>> 
>>>> On 7/7/2016 10:21 AM, Walter Underwood wrote:
>>>>> This looks like a job for “bq”, the boost query parameter. I used
this to boost textbooks which were used at the student’s school. bq does not force documents
to be included in the result set. It does affect the ranking of the included documents.
>>>>> 
>>>>> bq=B1_ss:Boost2 will boost documents that match that. You can use weights,
like bq=B1_ss:Boost2^10
>>>>> 
>>>>> Here is the relationship between fq, q, and bq:
>>>>> 
>>>>> fq: selection, does not affect ranking
>>>>> q: selection and ranking
>>>>> bq: does not affect selection, affects ranking
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>> 
>>>>>> On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <mark.trembley@etrailer.com>
wrote:
>>>>>> 
>>>>>> I have a question about the best way to rank my results based on
a score field that can have different values per document and where each document can have
different scores based on which term is queried.
>>>>>> 
>>>>>> Essentially what I'm wanting to have happen is provide a list of
terms that when matched via a query it returns a corresponding score to help boost the original
document. So if I had a document with a multi-valued field named B1_ss with terms [Boost1|10],
[Boost2|20], [Boost3|100] and my search query is "Boost2", I want that document's result to
be boosted by 20. Also note that "Boost2" can boost different documents at different levels.
The query to select the actual documents will select against other fields in the document
and could possibly return documents with any combination of B1 terms.
>>>>>> 
>>>>>> I'm still trying to figure out how best to model this in my index,
either as child documents, or in another collection, or if it would make more sense to figure
out how to make it work via payloads or by boosting the terms at index time.
>>>>>> 
>>>>>> I'm running Solr 5.5.1 in cloud mode. Each server has a complete
replica of all collections.
>>>>>> 
>>>>>> The document structure I've been toying with the most is to put the
boosts into a separate index and join them using !join syntax and returning the scores, but
I've not had any luck getting quality results from those tests. The extra "scores" index is
structured like this (I'll add the json for my test collections at the end of the email):
>>>>>> id:Document1_Boost1
>>>>>> B1_s:Boost1
>>>>>> B1_f:10
>>>>>> id:Document1_Boost3
>>>>>> B1_s:Boost3
>>>>>> B1_f:100
>>>>>> Using this structure, I get close, but the scores are not what I'm
expecting. If I use the following query, the explain says it's using the score from Document6_Boost2
even though my query is specifying B1_s:Boost3
>>>>>> http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss
fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
>>>>>> 
>>>>>> <lstname="explain">
>>>>>> <strname="Document6">
>>>>>> *3.379996* = Score based on join value Document6_Boost2
>>>>>> </str>
>>>>>> <strname="Document1">
>>>>>> *2.2533307* = Score based on join value Document1_Boost1
>>>>>> </str>
>>>>>> <strname="Document7">
>>>>>> *0.24786638* = Score based on join value Document7_Boost333
>>>>>> </str>
>>>>>> <strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
>>>>>> </lst>
>>>>>> 
>>>>>> My guess is that it's now doing an all document query on the "scores"
collection to return the scores in addition to the B1_s query I've passed in. I can't figure
out where it's getting those scores from as a simple query against the "scores" collection
returns scores like I'd expect to see them based on a similar query:
>>>>>> http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND _val_:B1_f&fl=score,*&debugQuery=true
>>>>>> 
>>>>>> <lstname="explain">
>>>>>> <strname="Document1_Boost3">
>>>>>> *46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity],
result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 = queryWeight, product
of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in
1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2,
maxDocs=8) 1.0 = fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0
= float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
>>>>>> </str>
>>>>>> <strname="Document6_Boost3">
>>>>>> *15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity],
result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 = queryWeight, product
of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 = queryNorm 1.9808292 = fieldWeight in
5, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2,
maxDocs=8) 1.0 = fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0
= float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
>>>>>> </str>
>>>>>> </lst>
>>>>>> 
>>>>>> I feel like I'm getting close to what I need, but it's just not clear
to me what I'm missing at this point.
>>>>>> 
>>>>>> The other option I've been toying with is using payloads, but actually
utilizing the payloads as part of the scoring process is beyond me at this time.
>>>>>> 
>>>>>> Any thoughts or hints on the best way to boost the relevancy of these
scoreswould be appreciated.
>>>>>> Thanks
>>>>>> Mark
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> GENERIC:
>>>>>> {
>>>>>>   "id" : "Document1",
>>>>>>   "B1_ss" : ["Boost1|10","Boost3|100"],
>>>>>>   "title_s" : "Title1"
>>>>>>   ,"otherstuff_ss" : ["stuff1","suggestion"]
>>>>>>   ,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document2",
>>>>>>   "B1_ss" : ["Boost2|20"],
>>>>>>   "name_s" : "Product2",
>>>>>>   "title_s" : "Title2"
>>>>>>   ,"otherstuff_ss" : ["stuff2","recommendation"]
>>>>>>   ,"B1_name_ss" : ["Document2_Boost1"]
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document3",
>>>>>>   "name_s" : "Product3",
>>>>>>   "B1_ss" : ["NoBoost"],
>>>>>>   "title_s" : "Title3"
>>>>>>   ,"otherstuff_ss" : ["stuff3","new","suggestion"]
>>>>>>   ,"B1_name_ss" : ["Document3_NoBoost"]
>>>>>> },
>>>>>>  {
>>>>>>  "id" : "Document4",
>>>>>>   "name_s" : "Product4",
>>>>>>   "title_s" : "Title4"
>>>>>>   ,"otherstuff_ss" : ["stuff4","old","suggestion"]
>>>>>> } ,
>>>>>>  {
>>>>>>  "id" : "Document5",
>>>>>>   "name_s" : "Product5",
>>>>>>   "title_s" : "Title5"
>>>>>>   ,"otherstuff_ss" : ["stuff5","recommendation"]
>>>>>> },
>>>>>>  {
>>>>>>   "id" : "Document6",
>>>>>>   "name_s" : "Product6",
>>>>>>   "B1_ss" : ["Boost2|15","Boost3|30"],
>>>>>>   "title_s" : "Title6"
>>>>>>   ,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
>>>>>> },
>>>>>>  {
>>>>>>    "id" : "Document7",
>>>>>>   "name_s" : "Product7",
>>>>>>   "B1_ss" : ["NoBoost","Boost333|1.1"],
>>>>>>   "title_s" : "Title7"
>>>>>>   ,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
>>>>>> }
>>>>>> 
>>>>>> SCORES:
>>>>>> {
>>>>>>   "id" : "Document1_Boost1",
>>>>>>   "B1_s" : "Boost1",
>>>>>>   "B1_f" : 10
>>>>>> },
>>>>>>   {
>>>>>>   "id" : "Document1_Boost3",
>>>>>>   "B1_s" : "Boost3",
>>>>>>   "B1_f" : 100
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document2_Boost2",
>>>>>>   "B1_s" : "Boost2",
>>>>>>   "B1_f" : 20
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document3_NoBoost",
>>>>>>   "B1_s" : "NoBoost"
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document6_Boost2",
>>>>>>   "B1_s" : "Boost2",
>>>>>>   "B1_f" : 15
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document6_Boost3",
>>>>>>   "B1_s" : "Boost3",
>>>>>>   "B1_f" : 30
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document7_NoBoost",
>>>>>>   "B1_s" : "NoBoost"
>>>>>> },
>>>>>> {
>>>>>>   "id" : "Document7_Boost333",
>>>>>>   "B1_s" : "Boost333",
>>>>>>   "B1_f" : 1.1
>>>>>> }
>>>>>> 
>>> 
>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message