lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hull <char...@flax.co.uk>
Subject Re: E-Commerce Search: tf-idf, tie-break and boolean model
Date Tue, 17 Oct 2017 08:09:50 GMT
For our e-commerce customers we've been recommending a test-based 
relevance tuning strategy: here's a series of blogs written for us by 
someone who ran search for the world's largest electronic component 
distributor: 
http://www.flax.co.uk/blog/2016/03/18/get-started-improving-site-search-relevancy/ 
which you might find interesting.

A lot of my work these days is sitting down with clients to work out how 
to create sets of test queries and how to test them effectively. We 
usually recommend Quepid as a tool to do this (www.quepid.com).

Cheers

Charlie



On 16/10/2017 11:16, alessandro.benedetti wrote:
> I was having the discussion with a colleague of mine recently, about
> E-commerce search.
> Of course there are tons of things you can do to improve relevancy:
> Custom similarity - edismax tuning - basic user events processing - machine
> learning integrations - semantic search ect ect
> 
> more you do, better the results will potentially be, basically it is an
> ocean to explore.
> To avoid going off topic and being pertinent to your initial request, let's
> take a look to the custom similarity problem.
> 
> In e-commerce, and generally in proper nouns searches TF is not relevant.
> IDF can help, but we need to focus on what IDF is used for in general, in
> lucene search :
> Mostly IDF is a measure of "how much this term is important in the user
> query".
> Basically Lucene ( and in general TF/IDF based Information Retrieval systems
> ) assume that more a term is rare in the corpus, more likely it is that it
> is important for the search query.
> That is not always true in e-commerce :
> "iphone cover" means the user is looking for a cover, which is good for
> his/her phone.
> iphone is rare. Cover is not. IDF will recognize "Iphone" to be the most
> pertinent term to the user intent.
> There's a lot to talk in here, let's stop :)
> 
> Anyway as a conclusion, go step by step, custom similarity + edismax
> optimised with proper phrase and shingle boosts should be a good start.
> Tie-breaking for e-commerce is likely to be ok, set to the default.
> But to discover that I would recommend to set up a relevancy measuring
> framework with golden queries and users feedback.
> 
> cheers
> 
> 
> 
> 
> 
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
> 


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Mime
View raw message