lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Using term offsets for hit highlighting
Date Mon, 19 Mar 2012 15:50:00 GMT
Have you marked that for GSOC? Would be a good idea!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Monday, March 19, 2012 4:43 PM
> To: dev@lucene.apache.org
> Subject: Re: Using term offsets for hit highlighting
> 
> Alan, you made my day!
> 
> The branch is kind of outdated but I looked at it lately and I can certainly help
> to get it up to speed. The feature in that branch is quite a big one and its in a
> very early stage. Still I want to encourage you to take a look and work on it. I
> promise all my help with the issues!
> 
> let me know if you have questions!
> 
> simon
> 
> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
> <alan.woodward@romseysoftware.co.uk> wrote:
> > Cool, thanks Robert.  I'll take a look at the JIRA ticket.
> >
> > On 19 Mar 2012, at 14:44, Robert Muir wrote:
> >
> >> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
> >> <alan.woodward@romseysoftware.co.uk> wrote:
> >>> Hello,
> >>>
> >>> The project I'm currently working on requires the reporting of exact
> >>> hit positions from some pretty hairy queries, not all of which are
> >>> covered by the existing highlighter modules.  I'm working round this
> >>> by translating everything into SpanQueries, and using the getSpans()
> >>> method to locate hits (I've extended the Spans interface to make
> >>> term offsets available - see
> >>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
> >>> our use-case, but isn't terribly efficient, and obviously isn't applicable
to
> non-Span queries.
> >>>
> >>> I've seen a bit of chatter on the list about using term offsets to
> >>> provide accurate highlighting in Lucene.  I'm going to have a couple
> >>> of weeks free in April, and I thought I might have a go at
> >>> implementing this.  Mainly I'm wondering if there's already been
> >>> thoughts about how to do it.  My current thoughts are to somehow
> >>> extend the Weight and Scorer interface to make term offsets
> >>> available; to get highlights for a given set of documents, you'd
> >>> essentially run the query again, with a filter on just the documents
> >>> you want highlighted, and have a custom collector that gets the term
> offsets in place of the scores.
> >>>
> >>
> >> Hi Alan, Simon started some initial work on
> >> https://issues.apache.org/jira/browse/LUCENE-2878
> >>
> >> Some work and prototypes were done in a branch, but it might be
> >> lagging behind trunk a bit.
> >>
> >> Additionally at the time it was first done, I think we didn't yet
> >> support offsets in the postings lists.
> >> We've since added this and several codecs support it.
> >>
> >> --
> >> lucidimagination.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message