lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <>
Subject RE: Anyone implemented custom hit ranking?
Date Sun, 14 Nov 2004 01:06:28 GMT
I've done some customization of scoring/ranking and plan to do more.  A
good place to start is with your own Similarity, extending Lucene's
DefaultSimilarity.  Like you, I found the default length normalization
to not work well with my dataset.  I separately weight each indexed
field according to a static relative importance (implemented as a query
boost factor that is automatically applied) and then disable length
normalization altogether by redefining lengthNorm() to always return

I also had problems with tf and idf normalization, especially with idf
dominating the ranking determination.  To address that, my Similarity
increases the base of the log for each, and adds a final square root to
the idf computation since Lucene squares the idf in the score

Have you tried the explain() mechanism?  It is a great way to see
precisely how your results are being scored (but be warned there is a
final normalization in Hits that explain() does not show -- this final
normalization does not affect the ranking order, but it does affect the
final scores).


  > -----Original Message-----
  > From: Sanyi []
  > Sent: Saturday, November 13, 2004 12:38 AM
  > To:
  > Subject: Anyone implemented custom hit ranking?
  > Hi!
  > I have problems with short text ranking. I've read about same raking
  > problems in the list
  > archives, but found only hints and toughts (adjust
  > Similarity, etc...), not
  > complete solutions with source code.
  > Anyone implemented a good solution for this problem? (example: my
  > application returns about
  > 10-20 pages of 1-2 word hits for "hello", and then it starts to list
  > longer texts)
  > I've implemented a very simple solution: I boost documents shorter
  > 300 chars with
  > 1/300*doclength at index time. Now it works a lot better. In fact, I
  > can't see any problems now.
  > Anyway, I think this is not "the solution", this is a patch or
  > workaround.
  > So, I'd be interested in some kind of well designed complete
  > for this problem.
  > Regards,
  > Sanyi
  > __________________________________
  > Do you Yahoo!?
  > Check out the new Yahoo! Front Page.
  > To unsubscribe, e-mail:
  > For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message