lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From András Péteri <apet...@b2international.com>
Subject Re: Lucene 5: Wrapping Collector
Date Mon, 29 Jun 2015 07:45:12 GMT
Hi,

IndexSearcher.search(Query, Collector) will iterate through all segments of
the index, call getLeafCollector, and use the returned LeafCollector to
collect result documents from that segment [1].

As LeafCollector's javadoc describes [2], there are cases when you want to
take into account precisely which segment you are processing, but sometimes
it doesn't matter. Depending on that, you can either return the same
LeafCollector instance from getLeafCollector all the time, or create new
ones each time it is called, using LeafReaderContext to extract
segment-specific values like docBase, or the LeafReader for that segment.

These can then be passed down to the custom LeafCollector, stored in
fields, and used when collect(int) callbacks occur. The example code in
javadoc uses docBase from LeafReaderContext to map segment-relative docIDs
to absolute ones.

[1]
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L609
[2]
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/LeafCollector.java

--
András

On Sun, Jun 28, 2015 at 5:03 AM, Selva Kumar <selva.kumar.at.work@gmail.com>
wrote:

> With wrapping collector scenarios, wrapping LeafCollector needs access to
> wrapped LeafCollector.
>
> If wrapping LeafCollector has access to LeafReaderContext, it seems one can
> use getLeafCollector "getter" anytime to get the wrapped leaf collector.
>
> if collect(int doc) method retrieves LeafCollector on every call, this does
> not work since getLeafCollector behaves more like a prototype factory.
>
> Question:
>
> If I am writing a new Collector, is it fair to assume getLeafCollector will
> be called only once per segment.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message