lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 小鱼儿 <ctengc...@gmail.com>
Subject Re: Question abount combining InvertedIndex and SortField
Date Thu, 02 Jan 2020 04:07:48 GMT
Hi,  Mikhail

Your words is very encouraging. I was thinking i might need to do another
Lucene custom query to apply my business-specific "index-only sort" and
"early-termination"... SortField API says can use any numeric/String field,
which is very perfect. (In this way, Lucene should be able to a high-perf
Top-N query if SortField can support dynamically-generated ranking
scores... not only the native indexed numeric/String fields)

Mikhail Khludnev <mkhl@apache.org> 于2019年12月31日周二 下午4:41写道:

> Hello, 小鱼儿.
>
> On Tue, Dec 31, 2019 at 6:32 AM 小鱼儿 <ctengctsh@gmail.com> wrote:
>
> > Assume i first use keyword search to get a DocIDSet from inverted index,
> > then i want to sort these docIds by some numeric field, like a
> > `updateTime`, does Lucene do this without need of loading the Document
> > objects but only with an sorted index on `updateTime`?
>
> 1. Lucene doesn't load Document objects from stored fields files while
> sorting for sure.
> 2. Lucene uses dedicated columnar data structure (DocVaues made index time,
> or in the worst case lazily loaded FieldCache) to obtain field values while
> collecting search results from inverted index.
> 3. One deviation from this generic algorithm is sorted index and early
> termination, that's probably what you meant in "Index-Only Search
> Optimization".
>
>
> > Which i call it
> > "Index-Only Sort Optimization" (MUST be some equal concepts in RDBMS?)
> >
> > And since Lucene has a `SortField` API, what does it do the sort? I
> thought
> >
> It brings up TopFieldCollector instead of the default TopScoreDocCollector.
>
> > SortField is just a post-processing...
> >
> Not really. Scoring/sorting should be done along side with searching to
> reduce memory footprint by storing only top candidate results in a binary
> heap.
> IIRC it's described in this classic paper
> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message