lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <>
Subject Re: block min-max values for Sort Field with Top-N query..
Date Tue, 02 Jul 2019 13:22:05 GMT

This is the same principle that we apply for block-max WAND so
theoretically that would work, though in practice it might be a bit
hard to implement due to the fact that we don't have the APIs that you
will need.

I have considered the idea of adding information about blocks to doc
values a couple times, but I think it'd be better to either:
 - Directly index the field into as a term frequency instead of doc
values, e.g. using FeatureField. One downside is that you can only
sort in one order efficiently.
 - Or using LongDistanceFeatureQuery if your field is also indexed
with points, by passing the max value of your index as the "origin" if
you want to sort in decreasing order and the min value if you want to
sort in increasing order. This would be a bit less efficient than
FeatureField but would allow sorting in either ascending or descending

On Tue, Jul 2, 2019 at 3:01 PM Ravikumar Govindarajan
<> wrote:
> Our Sort Fields utilize DocValues..
> Lets say I collect min-max ords of a Sort Field for a block of documents
> (128, 256 etc..) at index-time via Codec & store it as part of DocValues at
> a Segment level..
> During query time, could we take advantage of this Stats when Top-N query
> with Sort Field is requested?
> Typically, what I had in mind is a SortStats class with the following method
> int *seek*(int *max-doc-seen-till-now*, int *min-sort-ord-seen-till-now*,
> boolean sortDesc) {
>   // 1. Fetch the doc-ranges that has >=
> *min-sort-ord-seen-till-now*
> *  // 2. *Return the least doc-range >= *max-doc-seen-till-now *(If
> SortDesc=true)
> *         Return the least doc-range <= max-doc-seen-till-now *(If
> SortDesc=false)
> }
> Top-N Collector can keep track of the *max-doc-seen-till-now &
> min-sort-ord-seen-till-now *variable during query time & then call the
> ** for a possible skip of blocks of documents that may
> otherwise be needlessly offered & popped out from the priority queue
> I understand this simplistic logic depends on sort-field data distribution
> & won't work for multi-sort field queries or out-of-order scoring etc..
> But, in general will this be a good idea to explore or something that is
> best not attempted?
> Any help is much appreciated
> --
> Ravi


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message