lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kaelbling <dkaelbl...@blackducksoftware.com>
Subject Re: SpanScorer handling of non-disjoint phrases
Date Thu, 24 Apr 2008 15:46:08 GMT
On Wed, 23 Apr 2008 at 21:58:41 -0400, Mark Miller wrote:
>
> Hmmm...my quick test of a query with two phrases and a common term 
> appeared to work correctly. Could you submit an example that
> demonstrates the failure or perhaps shed some further light on the
> problem?

Hi,

"Already fixed" is entirely possible!  I'm using an old snapshot from
2/10/2008, and the code I was looking at (in WeightedSpanTermExtractor) doesn't
seem to exist any more -- maybe it mutated into QueryTermExtractor?  Anyway
the query was:

+contents:"hello world" +contents:1.0 +(contents:movie contents:"hello dolly 1.0")

The WeightedSpanTermExtractor code looked like this:

    if (query instanceof BooleanQuery) {
      BooleanClause[] queryClauses = ((BooleanQuery) query).getClauses();
      Map booleanTerms = new HashMap();
      for (int i = 0; i < queryClauses.length; i++) {
        if (!queryClauses[i].isProhibited()) {
          extract(queryClauses[i].getQuery(), booleanTerms);
        }
      }
      terms.putAll(booleanTerms);
    } else if (query instanceof PhraseQuery) { ...

If a term in 'booleanTerms' was already in the terms map, putAll discarded 
the old value.  I had to tweak this to merge the maps, and if both old and
new terms were position sensitive combine the two position spans (otherwise
keep the insensitive WeightedSpanTerm).

If you're using a HashSet of WeightedTerms rather than a Map keyed on Terms, 
the collision I experienced may not happen.

	Thanks,
	David

-- 
David Kaelbling
Senior Software Engineer
Black Duck Software, Inc.

dkaelbling@blackducksoftware.com
T +1.781.810.2041
F +1.781.891.5145

http://www.blackducksoftware.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message