lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Highlighter that works with phrase and span queries
Date Wed, 27 Jun 2007 15:44:44 GMT
On Wednesday 27 June 2007 17:17, mark harwood wrote:
> >>you would still have the major problem of which matches do you keep 
information for
> Yes, doing this efficiently is the main issue. Some vague thoughts I had:
> 3) For each call to on the top level query, the 
HighlightObserver class would check to see if the doc was a "keeper" (i.e. 
it's score places it in the required top "n" docs PriorityQueue) and if so, 
would retain a copy of all the transient match info currently held in the 
ThreadLocal for this doc and associate it with the new TopDoc object placed 
in the top docs PriorityQueue.

This can be done more efficiently by skipping the Spans themselves to
the next document for which the matches need to be kept. For each doc,
the Spans could then be copied by iterating until the next matching doc
in the search.
Even better would be to use a Filter in the search to limit the results to the 
matches that are immediately needed, but a Filter still requires a BitSet
over all indexed documents, and that is probably overkill for highlighting.
Iterating the Spans will be in doc number order, so some mapping back
to the scored order would still be needed.

I have not looked at any highlighting code yet. Is there already an extension
of PhraseQuery that has getSpans() ?

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message