lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Chiverton ...@extravision.com>
Subject Re: Trouble boosting a field
Date Mon, 16 Jan 2017 09:47:43 GMT
Ohh, that's handy ! But it needs Solr/ElasticSearch to be publicly 
accessible ?


On 14/01/17 09:23, Alan Woodward wrote:
> http://splainer.io/ <http://splainer.io/> from the gents at OpenSourceConnections
is pretty good for this sort of thing, I find…
>
> Alan Woodward
> www.flax.co.uk
>
>
>> On 13 Jan 2017, at 16:35, Tom Chiverton <tc@extravision.com> wrote:
>>
>> Well, I've tried much larger values than 8, and it still doesn't seem to do the job
?
>>
>> For now, assume my users are searching for exact sub strings of a real title.
>>
>> Tom
>>
>>
>> On 13/01/17 16:22, Walter Underwood wrote:
>>> I use a boost of 8 for title with no boost on the content. Both Infoseek and
Inktomi settled on the 8X boost, getting there with completely different methodologies.
>>>
>>> You might not want the title to completely trump the content. That causes some
odd anomalies. If someone searches for “ice age 2”, do you really want every title with
“2” to come before “ice age two”? Or a search for “steve jobs” to return every
article with “job” or “jobs” in the title first?
>>>
>>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five
years ago.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton <tc@extravision.com> wrote:
>>>>
>>>> I have a few hundred documents with title and content fields.
>>>>
>>>> I want a match in title to trump matches in content. If I search for "connected
vehicle" then a news article that has that in the content shouldn't be ranked higher than
the page with that in the title is essentially what I want.
>>>>
>>>> I have tried dismax with qf=title^2 as well as several other variants with
the standard query parser (like q="title:"foo"^2 OR content:"foo") but documents without the
search term in the title still come out before those with the term in the title when ordered
by score.
>>>>
>>>> Is there something I am missing ?
>>>>
>>>>  From the docs, something like q=title:"connected vehicle"^2 OR content:"connected
vehicle" should have worked ? Even using ^100 didn't help.
>>>>
>>>> I tried with the dismax parser using
>>>>
>>>>        "q": "Connected Vehicle",
>>>>        "defType": "dismax",
>>>>        "indent": "true",
>>>>        "qf": "title^2000 content",
>>>>        "pf": "pf=title^4000 content^2",
>>>>        "sort": "score desc",
>>>>        "wt": "json",
>>>>
>>>> but that was not better. if I remove content from pf/qf then documents seem
to rank correctly.
>>>> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8
<http://pastebin.com/5EhrRJP8> with managed-schema http://pastebin.com/mdraWQWE <http://pastebin.com/mdraWQWE>
>>>>
>>>> -- 
>>>> <spacer.gif>
>>>> <spacer.gif>
>>>> <spacer.gif>
>>>> Tom Chiverton
>>>> Lead Developer
>>>> <spacer.gif>
>>>> e: 	 <mailto:tc@extravision.com>tc@extravision.com <mailto:tc@extravision.com>
>>>> p: 	0161 817 2922
>>>> t: 	@extravision <http://www.twitter.com/extravision>
>>>> w: 	 <http://www.extravision.com/>www.extravision.com <http://www.extravision.com/>
>>>> <spacer.gif>
>>>> <outlook-logo.gif> <http://www.extravision.com/>
>>>> <spacer.gif>
>>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester,
M15 4LD.
>>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>>>
>>>> This e-mail is intended solely for the person to whom it is addressed and
may contain confidential or privileged information.
>>>> Any views or opinions presented in this e-mail are solely of the author and
do not necessarily represent those of Extravision Ltd.
>>>> <spacer.gif>
>>> ______________________________________________________________________
>>> This email has been scanned by the Symantec Email Security.cloud service.
>>> For more information please visit http://www.symanteccloud.com
>>> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message