lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 15:43:09 GMT
Alan, you made my day!

The branch is kind of outdated but I looked at it lately and I can
certainly help to get it up to speed. The feature in that branch is
quite a big one and its in a very early stage. Still I want to
encourage you to take a look and work on it. I promise all my help
with the issues!

let me know if you have questions!


On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
<> wrote:
> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>> <> wrote:
>>> Hello,
>>> The project I'm currently working on requires the reporting of exact hit
>>> positions from some pretty hairy queries, not all of which are covered by
>>> the existing highlighter modules.  I'm working round this by translating
>>> everything into SpanQueries, and using the getSpans() method to locate hits
>>> (I've extended the Spans interface to make term offsets available -
>>> see  This works for our
>>> use-case, but isn't terribly efficient, and obviously isn't applicable to
>>> non-Span queries.
>>> I've seen a bit of chatter on the list about using term offsets to provide
>>> accurate highlighting in Lucene.  I'm going to have a couple of weeks free
>>> in April, and I thought I might have a go at implementing this.  Mainly I'm
>>> wondering if there's already been thoughts about how to do it.  My current
>>> thoughts are to somehow extend the Weight and Scorer interface to make term
>>> offsets available; to get highlights for a given set of documents, you'd
>>> essentially run the query again, with a filter on just the documents you
>>> want highlighted, and have a custom collector that gets the term offsets in
>>> place of the scores.
>> Hi Alan, Simon started some initial work on
>> Some work and prototypes were done in a branch, but it might be
>> lagging behind trunk a bit.
>> Additionally at the time it was first done, I think we didn't yet
>> support offsets in the postings lists.
>> We've since added this and several codecs support it.
>> --
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message