lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)
Date Fri, 01 Mar 2013 12:56:39 GMT
The slowdown happens not on making the doc ids absolute (it is just an addition), the slowdown
appears when you retrieve the stored fields on the top-level reader (because the composite
top-level reader has to do a binary search in the reader tree to find the correct reader).
This answer was related to the code pasted by the user asking this question.

If you need top-level doc ids because you present the global doc-ids to the user (e.g. this
is how TopScoreDocCollector works), you can of course add the doc base. But inside the collector
it makes absolutely no sense to transform the local and relative doc ids to absolute ones
just to call a method on a top-level reader that needs to do the opposite with a binary search.
In that case, use the AtomicReader directly. If you also access FieldCache, working with absolute
doc-ids also brings in waste of megabytes of memory and FieldCache insanity.


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Michael Sokolov []
> Sent: Friday, March 01, 2013 1:41 PM
> To:
> Cc: Uwe Schindler
> Subject: Re: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> On 2/28/2013 5:05 PM, Uwe Schindler wrote:
> > ...  Collector instead of HitCollector (like your ancient Lucene from 2.4), you
> have to respect the new semantics that are *different* to old HitCollector.
> Collector works with low-level atomic readers (also in Lucene 3.x), the calls to
> the "collect(int)" method are *not* using global document IDs, so using a
> IndexReader from outside does not work and will never work - PERIOD: The
> document IDs are only *relative* to the atomic reader that was passed to
> the collector by setNextReader() before a sequence of collect() calls. To
> make global docIds out of it, you may use readerContext.docBase, but this is
> slower than using the low-level atomic reader.
> >
> Uwe, thanks for this lucid explanation!  I wonder if you wouldn't mind
> elaborating a bit on the slowdown you refer to from using docBase to
> absolutize docIDs.  I have a use case where I need to pass control to my
> caller, allowing them to *pull* results - so I don't know how many I will need.
> In the case where documents are returned in(docID) order, the code is
> actually pretty straightforward: I iterate over the atomic readers and pull
> results from each in turn.  Are you saying that is slower because it prevents
> multi-threading, or is there some other reason?
> -Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message