lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Re: Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 14:52:39 GMT
Cool, thanks Robert.  I'll take a look at the JIRA ticket.

On 19 Mar 2012, at 14:44, Robert Muir wrote:

> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
> <alan.woodward@romseysoftware.co.uk> wrote:
>> Hello,
>> 
>> The project I'm currently working on requires the reporting of exact hit
>> positions from some pretty hairy queries, not all of which are covered by
>> the existing highlighter modules.  I'm working round this by translating
>> everything into SpanQueries, and using the getSpans() method to locate hits
>> (I've extended the Spans interface to make term offsets available -
>> see https://issues.apache.org/jira/browse/LUCENE-3826).  This works for our
>> use-case, but isn't terribly efficient, and obviously isn't applicable to
>> non-Span queries.
>> 
>> I've seen a bit of chatter on the list about using term offsets to provide
>> accurate highlighting in Lucene.  I'm going to have a couple of weeks free
>> in April, and I thought I might have a go at implementing this.  Mainly I'm
>> wondering if there's already been thoughts about how to do it.  My current
>> thoughts are to somehow extend the Weight and Scorer interface to make term
>> offsets available; to get highlights for a given set of documents, you'd
>> essentially run the query again, with a filter on just the documents you
>> want highlighted, and have a custom collector that gets the term offsets in
>> place of the scores.
>> 
> 
> Hi Alan, Simon started some initial work on
> https://issues.apache.org/jira/browse/LUCENE-2878
> 
> Some work and prototypes were done in a branch, but it might be
> lagging behind trunk a bit.
> 
> Additionally at the time it was first done, I think we didn't yet
> support offsets in the postings lists.
> We've since added this and several codecs support it.
> 
> -- 
> lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message