lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject block min-max values for Sort Field with Top-N query..
Date Tue, 02 Jul 2019 13:01:18 GMT
Our Sort Fields utilize DocValues..

Lets say I collect min-max ords of a Sort Field for a block of documents
(128, 256 etc..) at index-time via Codec & store it as part of DocValues at
a Segment level..

During query time, could we take advantage of this Stats when Top-N query
with Sort Field is requested?

Typically, what I had in mind is a SortStats class with the following method

int *seek*(int *max-doc-seen-till-now*, int *min-sort-ord-seen-till-now*,
boolean sortDesc) {
  // 1. Fetch the doc-ranges that has >=
*min-sort-ord-seen-till-now*
*  // 2. *Return the least doc-range >= *max-doc-seen-till-now *(If
SortDesc=true)
*         Return the least doc-range <= max-doc-seen-till-now *(If
SortDesc=false)
}

Top-N Collector can keep track of the *max-doc-seen-till-now &
min-sort-ord-seen-till-now *variable during query time & then call the
*SortStats.seek()* for a possible skip of blocks of documents that may
otherwise be needlessly offered & popped out from the priority queue

I understand this simplistic logic depends on sort-field data distribution
& won't work for multi-sort field queries or out-of-order scoring etc..

But, in general will this be a good idea to explore or something that is
best not attempted?

Any help is much appreciated

--
Ravi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message