lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: SpanScorer handling of non-disjoint phrases
Date Thu, 24 Apr 2008 00:55:37 GMT
On Wed, 2008-04-23 at 16:15 -0400, David Kaelbling wrote:
> Hi,
> I've been using the 2.3.1 contrib highlighter with the 2/10/2008
> SpanHighlighter patch, and have run into some trouble.  If I have two
> phrases in a query that share terms (e.g. "hello world" and "hello
> goodbye") the SpanScorer seems to not highlight 'hello' consistently.
> It looks to me like WeightedSpanTermExtractor.extract() is clobbering
> the span positions for 'hello' the second time it encounters the term.
> Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really
> be replacing the old entry, or should the try to addPositionSpans()?
>         Thanks,
>         David
> PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores()
> will wrap the cachingTokenFilter passed it by SpanScorer.init() into
> another CachingTokenFilter, duplicating the cache?

Hmmm...reminds me of an early dev bug I thought I added a test case for
and fixed.

I will take a look as soon as I can.

- mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message