lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SpanScorer handling of non-disjoint phrases
Date Thu, 24 Apr 2008 01:58:41 GMT

´╗┐Hmmm...my quick test of a query with two phrases and a common term
appeared to work correctly. Could you submit an example that
demonstrates the failure or perhaps shed some further light on the
problem?

As to your P.S. question, you are right...that particular method was
needlessly re wrapping the stream. I have fixed it now, thanks for
pointing it out.

- Mark

On Wed, 2008-04-23 at 20:55 -0400, Mark Miller wrote:
> On Wed, 2008-04-23 at 16:15 -0400, David Kaelbling wrote:
> > Hi,
> > 
> > I've been using the 2.3.1 contrib highlighter with the 2/10/2008
> > SpanHighlighter patch, and have run into some trouble.  If I have
two
> > phrases in a query that share terms (e.g. "hello world" and "hello
> > goodbye") the SpanScorer seems to not highlight 'hello'
consistently.
> > 
> > It looks to me like WeightedSpanTermExtractor.extract() is
clobbering
> > the span positions for 'hello' the second time it encounters the
term.
> > Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms)
really
> > be replacing the old entry, or should the try to addPositionSpans()?
> > 
> >         Thanks,
> >         David
> > 
> > PS: And while I'm asking, it looks like
getWeightedSpanTermsWithScores()
> > will wrap the cachingTokenFilter passed it by SpanScorer.init() into
> > another CachingTokenFilter, duplicating the cache?
> > 
> 
> Hmmm...reminds me of an early dev bug I thought I added a test case
for
> and fixed.
> 
> I will take a look as soon as I can.
> 
> - mark
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message