lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <a.benede...@sease.io>
Subject Re: Learning to Rank (LTR) with grouping
Date Tue, 24 Apr 2018 10:15:02 GMT
Are you using SolrCloud or any distributed search ?

If you are using just a single Solr instance, LTR should have no problem
with pagination.
The re-rank involves the top K and then you paginate.
So if a document from the original score page 1 ends up in page 3, you will
see it at page three.
have you verified that : "Say, if an item (Y) from second page is moved to
first page after 
re-ranking, while an item (X) from first page is moved away from the first 
page.  ?" 
Top K shouldn't start from the "start" parameter, if it does, it is a bug.

The situation change a little with distributed search where you can
experiment this behaviour : 

*Pagination*
Let’s explore the scenario on a single Solr node and on a sharded
architecture.

SINGLE SOLR NODE

reRankDocs=15
rows=10
This means each page is composed by 10 results.
What happens when we hit the page 2 ?
The first 5 documents in the search results will have been rescored and
affected by the reranking.
The latter 5 documents will preserve the original score and original
ranking.

e.g.
Doc 11 – score= 1.2
Doc 12 – score= 1.1
Doc 13 – score= 1.0
Doc 14 – score= 0.9
Doc 15 – score= 0.8
Doc 16 – score= 5.7
Doc 17 – score= 5.6
Doc 18 – score= 5.5
Doc 19 – score= 4.6
Doc 20 – score= 2.4
This means that score(15) could be < score(16), but document 15 and 16 are
still in the expected order.
The reason is that the top 15 documents are rescored and reranked and the
rest is left unchanged.

*SHARDED ARCHITECTURE*

reRankDocs=15
rows=10
Shards number=2
When looking for the page 2, Solr will trigger queries to she shards to
collect 2 pages per shard :
Shard1 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored docs
(page2)
Shard2 : 10 ReRanked docs (page1) + 5 ReRanked docs + 5 OriginalScored docs
(page2)

The the results will be merged, and possibly, original scored search results
can top up reranked docs.
A possible solution could be to normalise the scores to prevent any
possibility that a reranked result is surpassed by original scored ones.

Note: The problem is going to happen after you reach rows * page >
reRankDocs. In situations when reRankDocs is quite high , the problem will
occur only in deep paging.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message