lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Reducing the boost for a particular Term
Date Tue, 11 Jul 2006 03:40:46 GMT
: particular author using the query. That is for documents returned by
: querying: (content:"miracle cure"), I would like to reduce the
: relevancy of authorid:3024

: +(content:"miracle cure") +(authorid:3024^0.5 ((-authorid:3024))^10.0)
: => The boosted optional prohibited term seems to have no effect
: +(content:"miracle cure") (-authorid:3024^0.5)
: => Again, the boosted optional prohibited term seems to have no effect

that's because you can't have a boolean query containing nothing but
negative clauses ... to "penalize" documents that match a query, you must
reward documents which match the inverse of that query.   From a Set
perspective the inverse of a query like "authorid:3024" is...

      +(+matchalldocs:true -authorid:3024)

...but for your purposes, you could just as easily use...

   +content:"miracle cure" +(+content:"miracle cure" -authorid:3024)

: Also this approach appears to add a small modifier to the overall
: relevancy returned by +(content:"miracle cure"). Is there a better way
: to do a multiplicative modification - so that for example the score as
: returned by +(content:"miracle cure") is reduced to 0.6 of its value
: if the authorid is 3024.

First off: don't place to much stock in the specific numeric score value
returned by a search, Scores aren't absolute, so trying to do absolute
things to them isn't really meaningful.

Second: take a look at the Explanation class (and the Searcher.explain
method) ... it will help you understand why you get the scores you are
getting, and how methods in the Similarity class (which you can change)
affect things.  With the query structure i outlined above, you can
probably get pretty close to what you want just by tewaking the boosts.

Third: if you really want to go all out, you could write a HitCollector to
penalize documents by performing some arbitrary match on the
scores of documents matching some specific criteria (like the authorid)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message