lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Sturge <tstu...@metaweb.com>
Subject multi-term query weighting
Date Tue, 03 Jul 2007 01:01:44 GMT
I have an index with two different sources of information, one small but 
of high quality (call it "title"), and one large, but of lower quality 
(call it "body").  I give boosts to certain documents related to their 
popularity (this is very similar to what one would do indexing the web).

The problem I have is a query like "John Bush". I translate that into " 
(title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) ". But the 
results I get are:

1. George Bush
...
4. John Kerry
...
10. John Bush

The reason is (looking at explain) that George Bush is scored:
169 = sum(
 1 =  <match in body with tiny norm for "John">
 )
 168 = sum(
     160 = <title match for "Bush">
     8 = <body match for "Bush">
 )
)

and John Kerry is similar but reversed. Poor old "John Bush" only scores:

72 = sum(
  40 = (<title match for "John">+<body match>)
  32 = (<title match for "Bush">+ <body match>)
)

because his initial boost was only 1/4 of George's.

The question I have is, how can tell the searcher to care about 
"balance"? I really want the score over 2 terms to be more like 
(sqrt(X)+sqrt(Y))^2 or maybe even exp(log(X)+log(Y))  rather than just 
X+Y. Is that supported in some obvious way, or is there some other way 
to phrase my query to say "I want both terms but they should both be 
important if possible?"

Thanks,

Tim







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message