lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kaelbling <dkaelbl...@blackducksoftware.com>
Subject SpanScorer handling of non-disjoint phrases
Date Wed, 23 Apr 2008 20:15:48 GMT
Hi,

I've been using the 2.3.1 contrib highlighter with the 2/10/2008
SpanHighlighter patch, and have run into some trouble.  If I have two
phrases in a query that share terms (e.g. "hello world" and "hello
goodbye") the SpanScorer seems to not highlight 'hello' consistently.

It looks to me like WeightedSpanTermExtractor.extract() is clobbering
the span positions for 'hello' the second time it encounters the term.
Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really
be replacing the old entry, or should the try to addPositionSpans()?

        Thanks,
        David

PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores()
will wrap the cachingTokenFilter passed it by SpanScorer.init() into
another CachingTokenFilter, duplicating the cache?

-- 
David Kaelbling
Senior Software Engineer
Black Duck Software, Inc.

dkaelbling@blackducksoftware.com
T +1.781.810.2041
F +1.781.891.5145

http://www.blackducksoftware.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message