lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Thinking about better highlighting
Date Thu, 25 Aug 2005 09:11:41 GMT
Unfortunately I've not had the time to address the
phrase highlighting issues in the current highlighter
but I think I've an idea as to how best to fix it:

I would suggest rewriting the highlighter to use Spans
not Terms to find the relevant sections in a text.
Most of the code required for such a solution is
around in one form or another but needs bringing

* The SpansExtractor class here:
This can be used to get Spans for a given query and
IndexReader to show where all query hits for a
document lie.

* The contrib section includes a MemoryIndex that can
provide a fast IndexReader for a single document
(faster than using RAMDirectory).

* The LuceneInAction code example for SpanQueries
includes a rudimentary highlighter that uses Spans to
control where markup is introduced given a collection
of Spans (This does not attempt to summarise long docs

The overall approach using this code would be to index
each doc to be highlighted in MemoryIndex, run
SpansExtractor using the (rewritten) user query and
the MemoryIndex's IndexReader, give the resulting
spans to an adapted LIA highlighter/summariser.

Some issues with this:
1) The contrib section would now have inter-project
dependencies (highlighter -> MemIndex) which would
need to be catered for in the Ant build process.
2) We may need to think about how we factor in IDF
weighting of individual terms to the summarising
process so that the more important terms influence the
selection of highlights.

Does this sound reasonable?

To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message