lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kaser <>
Subject searchAfter is missing results when custom noncontinuous slices are used
Date Wed, 24 May 2017 14:06:18 GMT
Hello everybody,

I have observed an unexpected behavior in Lucene, and I am unsure 
whether this is a bug, or a missing warning in the documentation:

I am using the IndexSearcher with an ExecutorService in order to take 
advantage of multiple CPU cores during the searches. I want to limit the 
number of cores a single search can occupy, so I have overwritten the 
IndexSearcher method
     protected LeafSlice[] slices(List<LeafReaderContext> leaves)
to return a fixed number of Slices. (e.g. 4).

I tried to create slices that are about the same size by looping over 
the leaves (ordered by size descending) and adding the current leaf to 
the slice with the smallest number of documents.

This worked well, until I stumbled upon a query for which searchAfter 
seemed to skip hits, so that the total number of hits obtained by 
multiple calls to searchAfter was lower than TopDocs.totalHits.

The issue seems to be how searchAfter works vs how TopDocs.merge works:

searchAfter skips every document with a higher score than the "after" 
document. In case of equal scores, it uses the document id and skips 
every document with a <= document id (see PagingFieldCollector).

TopDocs.merge uses the score to determine which hits should be part of 
the merged TopDocs. In case of equal scores, it uses the shard index 
(this corresponds to the slices the IndexSearcher uses) to break ties 
(see ScoreMergeSortQueue.lessThan)

So if the shards are noncontinuous (as they are in my case), searchAfter 
uses a different way of sorting the documents than TopDocs.merge, and 
therefore hits are skipped.

Here are my questions:

* Are slices meant to be continuous "sublists" of the passed 
leaves-list? Or is my way of slicing meant to be supported?
* If my way of slicing is not supported, could you either add a warning 
to the javadocs of the slices method or maybe even add  a check for a 
legal return value of slices()?
* Should I create a jira issue for this?

Sorry for the wall of text, I hope I explained the problem in an 
understandable way!

Thank you and best regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message