lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: BoostingTermQuery scoring
Date Thu, 06 Nov 2008 23:56:55 GMT
Hi Peter,

On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
> I've discovered another flaw in using this technique:
> 
> (+contents:petroleum +contents:engineer +contents:refinery)
> (+boost:petroleum +boost:engineer +boost:refinery)
> 
> It's possible that the first clause will produce a matching
> doc and none of the terms in the second clause are used to
> score that doc. Yet another reason to use BoostingTermQuery.

I think you could address this, without BTQ, using something like:

  boost:(+petroleum +engineer +refinery)
  (+contents:(+petroleum +engineer +refinery)
   +((*:* -boost:petroleum)
     (*:* -boost:engineer)
     (*:* -boost:refinery)))

The last three lines gives you the set of documents that are missing at least one of the terms
in the "boost" field.  The *:* thingy, indicating a MatchAllDocsQuery, is necessary to get
all documents that don't have a given term; Lucene's (sub-)query document exclusion operation
needs a non-empty set on which to operate.

On 11/06/2008 at 1:08 PM, Peter Keegan wrote:
> Then, at search time, a query for "petroleum engineer" gets rewritten
> to: (+contents:petroleum +contents:engineer) (+boost:petroleum
> +boost:engineer). Note that the two clauses are OR'd so that a term that
> exists in both fields will get a higher weight in the 'boost' field.
> This works quite well at boosting documents with terms that exist in the
> boosted fields. However, it doesn't work properly if excluded terms are
> added, for example:
> 
> (+contents:petroleum +contents:engineer -contents:drilling)
> (+boost:petroleum +boost:engineer -boost:drilling)
> 
> If a document contains the term 'drilling' in the 'body'
> field, but not in the 'title' or 'city' field, a false hit occurs.

I think you could address this problem like this:

  +(boost:(+petroleum +engineer)
    (+contents:(+petroleum +engineer)
     +((*:* -boost:petroleum)
       (*:* -boost:engineer))))
  -contents:drilling

You don't have to include "-boost:drilling", because this condition is entailed by "-contents:drilling".

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message