lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: SpanQuery and Spans optimizations
Date Thu, 06 Aug 2009 18:40:57 GMT

On Aug 6, 2009, at 2:31 PM, Paul Elschot wrote:

> With a single search one might end up collecting lots of span info
> that will be thrown away because the document score is too low.

Presumably, you would only collect it if the result was actually put  
onto the PriorityQueue, in other words, after scoring that particular  
doc, so you would only be keeping Span values for the number of  
results requested.  I'd be willing to trade off that memory, I think,  
versus having to go iterate/skip all over Spans again.

> So I think the best way is to first collect the best hits in the usual
> way, and then get the spans of the query (effectively once more,
> but now without SpanScorer in between) with the doc numbers
> of the best hits as a filter while collecting all the begin/end  
> positions.

Yes, that is what I've traditionally done, but it is convoluted to  
associate it with a ranked list of docs.

View raw message