lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stewart <Robert.Stew...@INFONGEN.COM>
Subject custom tag scoring question
Date Wed, 08 Oct 2008 15:48:12 GMT
We have a custom "tagger" application which identifies certain entities (such as companies,
etc.) and applies a "relevance" value to each entity based upon overall relevance in some
document.

Then we index these "tags" into Lucene index by storing them in an indexed field (same name,
different values), for example "company=A, company=B, company=C",etc.

I know how to set the boost on each field according the relevance value from our tagging application.
 However, sorting does not seem to work properly, since according to documentation all boost
values per document under fields of the same name are actually combined by multiplying together:

>From http://lucene.apache.org/java/docs/scoring.html:

"For each field of a document, all boosts of that field (i.e. all boosts under the same field
name in that doc) are multiplied."

So if I have two document, each with some entities:

Doc 1: A (100%), B (50%), C (25%)
Doc2: A(75%), D (50%)

Then query for A should return Doc1 ahead of Doc2.  But seems like what happens is this:

Doc1 boost = 1.0 * 0.5 * 0.25 = 0.125
Doc2 boost = 0.75 * 0.50 = 0.375

Therefore query for A returns Doc2 ahead of Doc1.

Is there a way around this (besides creating a different field name for each tag)?  Can I
create custom similarity or scoring classes to handle this at query time somehow?

Thanks,
Bob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message