lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Make TermScorer non final
Date Wed, 18 Mar 2009 18:05:15 GMT

On Mar 18, 2009, at 7:57 AM, Michael McCandless wrote:

>
> Coming from the discussions in LUCENE-1522 (improving highlighter), I
> think at some point we should merge Span*Query into their normal
> counterparts, if possible.
>
> Ie, there should be only one TermQuery that can do both what the
> current TermQuery does, and also what SpanTermQuery does.  It's able
> to enumerate the spans/payloads for a given document, and if you don't
> request those, the performance should hopefully be equal to that of
> the current TermQuery.
>
> The highligher would in fact request spans for a "normal" TermQuery,
> on a single doc index at a time, in order to locate the hits.
>
> Likewise for SpanOrQuery, SpanAndQuery.
>
> I have no real sense of how much work this is, what problems would
> ensue (eg possible difference in scoring, etc.), but from
> highlighter's standpoint, ideally all queries need to be able to
> enumerate the collection of positions that established the match.

Maybe they should all implement a common Interface that provides  
highlighting info?  I don't know what it would be, but it seems easier  
to do that then to merge them all, but I'm not sure.  Not that I  
wouldn't want to see a simpler query system.   There's some cool  
things you can do w/ spans, but they still have some fundamental flaws  
that make them annoying.  Namely, often times one of the reasons you  
want Spans is b/c you care about what is going on around the match,  
i.e. co-occurrence data, yet it is still annoying/difficult to get  
that information w/o pivoting around either term vectors or re  
analyzing the document.  With the new Attribute stuff, however, it  
might be getting a little easier, as one could now store offset  
information at the term level (which you can do w/ payloads, too) and  
then use that to index into the original String.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message