lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Using term offsets for hit highlighting
Date Wed, 21 Mar 2012 08:49:06 GMT
Alan, if you want I can just merge the branch up next week and we
iterate from there?

simon

On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
<erickerickson@gmail.com> wrote:
> Yep, the first challenge is always getting the old patch(es) to apply.....
>
> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
> <alan.woodward@romseysoftware.co.uk> wrote:
>> Thanks for all the offers of help!  It looks as though most of the hard work has
already been done, which is exactly where I like to pick up projects.  :-)
>>
>> Maybe the best place to start would be for me to rebase the branch against trunk,
and see what still fits?  I think there have been some fairly major changes in the internals
since July last year.
>>
>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>
>>> I posted a patch with a Collector somewhat similar to what you described, Alan
- it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318.
  It is in a fairly complete "alpha" state, but has seen no production use of course, since
it relies on the remainder of the unfinished work in that branch.  It works by creating a
TokenStream based on match positions returned from the query and passing that to the existing
Highlighter.  Please feel free to get in touch if you decide to look into that and have questions.
>>>
>>>
>>> -Mike
>>>
>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>>>>
>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>
>>>>  yes I did
>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>> http://www.thetaphi.de
>>>>> eMail: uwe@thetaphi.de
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>> To: dev@lucene.apache.org
>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>
>>>>>> Alan, you made my day!
>>>>>>
>>>>>> The branch is kind of outdated but I looked at it lately and I can
certainly help
>>>>>> to get it up to speed. The feature in that branch is quite a big
one and its in a
>>>>>> very early stage. Still I want to encourage you to take a look and
work on it. I
>>>>>> promise all my help with the issues!
>>>>>>
>>>>>> let me know if you have questions!
>>>>>>
>>>>>> simon
>>>>>>
>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>> <alan.woodward@romseysoftware.co.uk>  wrote:
>>>>>>
>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>
>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>> <alan.woodward@romseysoftware.co.uk>  wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> The project I'm currently working on requires the reporting
of exact
>>>>>>>>> hit positions from some pretty hairy queries, not all
of which are
>>>>>>>>> covered by the existing highlighter modules.  I'm working
round this
>>>>>>>>> by translating everything into SpanQueries, and using
the getSpans()
>>>>>>>>> method to locate hits (I've extended the Spans interface
to make
>>>>>>>>> term offsets available - see
>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This
works for
>>>>>>>>> our use-case, but isn't terribly efficient, and obviously
isn't applicable to
>>>>>>>>>
>>>>>> non-Span queries.
>>>>>>
>>>>>>>>> I've seen a bit of chatter on the list about using term
offsets to
>>>>>>>>> provide accurate highlighting in Lucene.  I'm going
to have a couple
>>>>>>>>> of weeks free in April, and I thought I might have a
go at
>>>>>>>>> implementing this.  Mainly I'm wondering if there's
already been
>>>>>>>>> thoughts about how to do it.  My current thoughts are
to somehow
>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>> available; to get highlights for a given set of documents,
you'd
>>>>>>>>> essentially run the query again, with a filter on just
the documents
>>>>>>>>> you want highlighted, and have a custom collector that
gets the term
>>>>>>>>>
>>>>>> offsets in place of the scores.
>>>>>>
>>>>>>>>>
>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>
>>>>>>>> Some work and prototypes were done in a branch, but it might
be
>>>>>>>> lagging behind trunk a bit.
>>>>>>>>
>>>>>>>> Additionally at the time it was first done, I think we didn't
yet
>>>>>>>> support offsets in the postings lists.
>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>
>>>>>>>> --
>>>>>>>> lucidimagination.com
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For
>>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>>>>>> commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message