lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 14:44:17 GMT
On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
<> wrote:
> Hello,
> The project I'm currently working on requires the reporting of exact hit
> positions from some pretty hairy queries, not all of which are covered by
> the existing highlighter modules.  I'm working round this by translating
> everything into SpanQueries, and using the getSpans() method to locate hits
> (I've extended the Spans interface to make term offsets available -
> see  This works for our
> use-case, but isn't terribly efficient, and obviously isn't applicable to
> non-Span queries.
> I've seen a bit of chatter on the list about using term offsets to provide
> accurate highlighting in Lucene.  I'm going to have a couple of weeks free
> in April, and I thought I might have a go at implementing this.  Mainly I'm
> wondering if there's already been thoughts about how to do it.  My current
> thoughts are to somehow extend the Weight and Scorer interface to make term
> offsets available; to get highlights for a given set of documents, you'd
> essentially run the query again, with a filter on just the documents you
> want highlighted, and have a custom collector that gets the term offsets in
> place of the scores.

Hi Alan, Simon started some initial work on

Some work and prototypes were done in a branch, but it might be
lagging behind trunk a bit.

Additionally at the time it was first done, I think we didn't yet
support offsets in the postings lists.
We've since added this and several codecs support it.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message