lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "alessandro.benedetti" <a.benede...@sease.io>
Subject Re: Pagination bug? when sorting by a field (not unique field)
Date Wed, 29 Mar 2017 16:10:33 GMT
The reason Mikhail mentioned that, is probably related to :

*The way how number of document calculated is changed (LUCENE-6711)*
/The number of documents (docCount) is used to calculate term specificity
(idf) and average document length (avdl). Prior to LUCENE-6711,
collectionStats.maxDoc() was used for the statistics. Now,
collectionStats.docCount() is used whenever possible, if not maxDocs() is
used.
Assume that a collection contains 100 documents, and 50 of them have
"keywords" field. In this example, maxDocs is 100 while docCount is 50 for
the "keywords" field. The total number of tokens for "keywords" field is
divided by docCount to obtain avdl. Therefore, docCount which is the total
number of documents that have at least one term for the field, is a more
precise metric for optional fields.
DefaultSimilarity does not leverage avdl, so this change would have
relatively minor change in the result list. Because relative idf values of
terms will remain same. However, when combined with other factors such as
term frequency, relative ranking of documents could change. Some Similarity
implementations (such as the ones instantiated with NormalizationH2 and
BM25) take account into avdl and would have notable change in ranked list.
Especially if you have a collection of documents with varying lengths.
Because NormalizationH2 tends to punish documents longer than avdl./

This means that if you are load balancing, the page 2 query could go to
another replica, where the doc is scored differently, ending up on a
different position ( and maybe appearing again as a final effect).
This scenario is referred to scored ranking, so it will not affect sorting (
and I believe in your initial mail you were referring not to sorting)

Cheers


Pablo wrote
> Mikhall,
> 
> effectively maxDocs are different and also deletedDocs, but numDocs are
> ok.
> 
> I don't really get it, but can that be the problem?





-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/Pagination-bug-when-sorting-by-a-field-not-unique-field-tp4327408p4327461.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message