lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Goldowsky <bo...@alum.mit.edu>
Subject Re: Demoting results
Date Fri, 19 Mar 2004 14:39:09 GMT
I asked:
> > Is there any way to build a query where the occurrence of a particular
> > Term (in a Keyword field) causes the rank of the document to be
> > decreased?

On Thu, 2004-03-18 at 13:32, Doug Cutting wrote:
> Have you tried assigning these very small boosts (0 < boost < 1) and 
> assigning other query clauses relatively large boosts (boost > 1)?

Thanks for the suggestion!  Unfortunately it doesn't have the desired
effect.  I wanted 
  title: asparagus
  various fields...
  doctype: bad

to score lower than 
  title: asparagus
  various similar fields...
  doctype: good

I was trying to formulate a query like, say 
  +(title: asparagus) (doctype:bad)^-3

which would make sure the "bad" document was ranked lower than any other
value for doctype.  But negative boosts are illegal. 

I tried your suggestion of putting large boost on the first clause and a
small one (0.01) on the second, but the "bad" document is still ranked 
higher than the good one -- it gets a slight improvement from the
doctype:bad match, times 0.01, which is a very slight improvement but
still positive.  Then it gets a big boost because it has a 1.0 rather
than a 0.5 coordination factor, so the bad item gets top billing.

I think I've identified a few ways to solve the puzzle, though:

(a) enumerate all the possible "good" types of documents and search for
them, rather than the single bad one.  Harder to maintain since doctypes
can be introduced, but possible.

(b) attach boost values less than one to the "bad" Documents at indexing
time.  Not as flexible as modifying the query, but plausible.

(c) a more complex query like this:
 (title:asparagus) OR (title:asparagus -doctype:bad)
 so for good documents both clauses will match and the coordination
factor will be in their favor.  This increases query complexity (they
aren't really simple one-term queries like this toy example), but
hopefully that will not be a performance issue.

Bng





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message