lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SpanScorer handling of non-disjoint phrases
Date Thu, 24 Apr 2008 00:55:37 GMT
On Wed, 2008-04-23 at 16:15 -0400, David Kaelbling wrote:
> Hi,
> 
> I've been using the 2.3.1 contrib highlighter with the 2/10/2008
> SpanHighlighter patch, and have run into some trouble.  If I have two
> phrases in a query that share terms (e.g. "hello world" and "hello
> goodbye") the SpanScorer seems to not highlight 'hello' consistently.
> 
> It looks to me like WeightedSpanTermExtractor.extract() is clobbering
> the span positions for 'hello' the second time it encounters the term.
> Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really
> be replacing the old entry, or should the try to addPositionSpans()?
> 
>         Thanks,
>         David
> 
> PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores()
> will wrap the cachingTokenFilter passed it by SpanScorer.init() into
> another CachingTokenFilter, duplicating the cache?
> 

Hmmm...reminds me of an early dev bug I thought I added a test case for
and fixed.

I will take a look as soon as I can.

- mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message