lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Brown <...@intelcompute.com>
Subject Re: Boosts for relevancy (shopping products)
Date Thu, 17 Mar 2016 10:15:33 GMT
Thanks Scott and John,

As luck would have it I've got a PhD graduate coming for an interview 
today, who just happened to do her research thesis on information 
retrieval with quantum theory and machine learning  :)

John, it sounds like you're describing my system!  Shopping products 
from multiple sources.  (De-duplication is going to be fun soon).

I already copy fields like merchant, brand, category, to string fields 
to use them as facets/filters.  I was contemplating removing the 
description due to the spammy issue you mentioned, I didn't know about 
the RemoveDuplicatesTokenFilterFactory, so I'm sure that's going to be a 
huge help.

Thanks a lot,
Rob


On 03/17/2016 10:01 AM, John Smith wrote:
> Hi,
>
> For once I might be of some help: I've had a similar configuration
> (large set of products from various sources). It's very difficult to
> find the right balance between all parameters and requires a lot of
> tweaking, most often in the dark unfortunately.
>
> What I've found is that omitNorms=true is a real breakthrough: without
> it results tend to favor small texts, which is not what's wanted for
> product names. I also added a RemoveDuplicatesTokenFilterFactory for the
> name as it's a common practice for spammers to repeat some key words in
> order to be better placed in results. Stemming and custom stop words
> (e.g. "cheap", "sale", ...) are other potential ideas.
>
> I've also ended up in removing the description field as it's often too
> broad, and name is now the only field left: brand, category and merchant
> (as well as other fields) are offered as additional filters using
> facets. Note that you'd have to re-index them as plain strings.
>
> It's more difficult to achieve but popularity boost can also be useful:
> you can measure it by sales or by number of clicks. I use a combination
> of both, and store those values using partial updates.
>
> Hope it helps,
> John
>
>
> On 17/03/16 09:36, Robert Brown wrote:
>> Hi,
>>
>> I currently have an index of ~50m docs representing shopping products:
>> name, description, brand, category, etc.
>>
>> Our "qf" is currently setup as:
>>
>> name^5
>> brand^2
>> category^3
>> merchant^2
>> description^1
>>
>> mm: 100%
>> ps: 5
>>
>> I'm getting complaints from the business concerning relevancy, and was
>> hoping to get some constructive ideas/thoughts on whether these boosts
>> look semi-sensible or not, I think they were put in place pretty much
>> at random.
>>
>> I know it's going to be a case of rounds upon rounds of testing, but
>> maybe there's a good starting point that will save me some time?
>>
>> My initial thoughts right now are to actually just search on the name
>> field, and maybe the brand (for things like "Apple Ipod").
>>
>> Has anyone got a similar setup that could share some direction?
>>
>> Many Thanks,
>> Rob
>>


Mime
View raw message