lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "alessandro.benedetti" <>
Subject Re: E-Commerce Search: tf-idf, tie-break and boolean model
Date Mon, 16 Oct 2017 10:16:01 GMT
I was having the discussion with a colleague of mine recently, about
E-commerce search.
Of course there are tons of things you can do to improve relevancy:
Custom similarity - edismax tuning - basic user events processing - machine
learning integrations - semantic search ect ect

more you do, better the results will potentially be, basically it is an
ocean to explore.
To avoid going off topic and being pertinent to your initial request, let's
take a look to the custom similarity problem.

In e-commerce, and generally in proper nouns searches TF is not relevant.
IDF can help, but we need to focus on what IDF is used for in general, in
lucene search :
Mostly IDF is a measure of "how much this term is important in the user
Basically Lucene ( and in general TF/IDF based Information Retrieval systems
) assume that more a term is rare in the corpus, more likely it is that it
is important for the search query.
That is not always true in e-commerce :
"iphone cover" means the user is looking for a cover, which is good for
his/her phone.
iphone is rare. Cover is not. IDF will recognize "Iphone" to be the most
pertinent term to the user intent.
There's a lot to talk in here, let's stop :)

Anyway as a conclusion, go step by step, custom similarity + edismax
optimised with proper phrase and shingle boosts should be a good start.
Tie-breaking for e-commerce is likely to be ok, set to the default.
But to discover that I would recommend to set up a relevancy measuring
framework with golden queries and users feedback.


Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. -
Sent from:

View raw message