lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: CPU Intensive Scoring Alternatives
Date Tue, 21 Feb 2017 08:01:55 GMT
Hi,

New default similarity is BM25. 
May be explicitly set similarity to tf-idf and see how it goes?

Ahmet


On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi <fuad@efendi.ca> wrote:
Hello,


Default TF-IDF performs poorly with the indexed 200 millions documents.
Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3
seconds. eDisMax. Because default operator "OR" and stopword "The" we have
50-70 millions documents as a query result, and scoring is CPU intensive.
What to do? Our typical queries return over million documents, and response
times of simple queries ranges from 50 milliseconds to 5-10 seconds
depending on result set.

This was just an exaggerated example with stopword “the”, but even simplest
query “Michael Jackson” runs 300ms instead of 3ms just because huge number
of hits and TF-IDF calculations. Solr 6.3.


Thanks,

--

Fuad Efendi

(416) 993-2060

http://www.tokenizer.ca
Search Relevancy, Recommender Systems 

Mime
View raw message