lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 15:51:37 GMT
On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Have you marked that for GSOC? Would be a good idea!
 yes I did
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Monday, March 19, 2012 4:43 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Using term offsets for hit highlighting
>>
>> Alan, you made my day!
>>
>> The branch is kind of outdated but I looked at it lately and I can certainly help
>> to get it up to speed. The feature in that branch is quite a big one and its in a
>> very early stage. Still I want to encourage you to take a look and work on it. I
>> promise all my help with the issues!
>>
>> let me know if you have questions!
>>
>> simon
>>
>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>> <alan.woodward@romseysoftware.co.uk> wrote:
>> > Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>> >
>> > On 19 Mar 2012, at 14:44, Robert Muir wrote:
>> >
>> >> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>> >> <alan.woodward@romseysoftware.co.uk> wrote:
>> >>> Hello,
>> >>>
>> >>> The project I'm currently working on requires the reporting of exact
>> >>> hit positions from some pretty hairy queries, not all of which are
>> >>> covered by the existing highlighter modules.  I'm working round this
>> >>> by translating everything into SpanQueries, and using the getSpans()
>> >>> method to locate hits (I've extended the Spans interface to make
>> >>> term offsets available - see
>> >>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>> >>> our use-case, but isn't terribly efficient, and obviously isn't applicable
to
>> non-Span queries.
>> >>>
>> >>> I've seen a bit of chatter on the list about using term offsets to
>> >>> provide accurate highlighting in Lucene.  I'm going to have a couple
>> >>> of weeks free in April, and I thought I might have a go at
>> >>> implementing this.  Mainly I'm wondering if there's already been
>> >>> thoughts about how to do it.  My current thoughts are to somehow
>> >>> extend the Weight and Scorer interface to make term offsets
>> >>> available; to get highlights for a given set of documents, you'd
>> >>> essentially run the query again, with a filter on just the documents
>> >>> you want highlighted, and have a custom collector that gets the term
>> offsets in place of the scores.
>> >>>
>> >>
>> >> Hi Alan, Simon started some initial work on
>> >> https://issues.apache.org/jira/browse/LUCENE-2878
>> >>
>> >> Some work and prototypes were done in a branch, but it might be
>> >> lagging behind trunk a bit.
>> >>
>> >> Additionally at the time it was first done, I think we didn't yet
>> >> support offsets in the postings lists.
>> >> We've since added this and several codecs support it.
>> >>
>> >> --
>> >> lucidimagination.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> >> additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message