lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: Anyone implemented custom hit ranking?
Date Sun, 14 Nov 2004 01:06:28 GMT
I've done some customization of scoring/ranking and plan to do more.  A
good place to start is with your own Similarity, extending Lucene's
DefaultSimilarity.  Like you, I found the default length normalization
to not work well with my dataset.  I separately weight each indexed
field according to a static relative importance (implemented as a query
boost factor that is automatically applied) and then disable length
normalization altogether by redefining lengthNorm() to always return
1.0f.

I also had problems with tf and idf normalization, especially with idf
dominating the ranking determination.  To address that, my Similarity
increases the base of the log for each, and adds a final square root to
the idf computation since Lucene squares the idf in the score
computations.

Have you tried the explain() mechanism?  It is a great way to see
precisely how your results are being scored (but be warned there is a
final normalization in Hits that explain() does not show -- this final
normalization does not affect the ranking order, but it does affect the
final scores).

Chuck

  > -----Original Message-----
  > From: Sanyi [mailto:need4sid@yahoo.com]
  > Sent: Saturday, November 13, 2004 12:38 AM
  > To: lucene-user@jakarta.apache.org
  > Subject: Anyone implemented custom hit ranking?
  > 
  > Hi!
  > 
  > I have problems with short text ranking. I've read about same raking
  > problems in the list
  > archives, but found only hints and toughts (adjust
DefaultSimilarity,
  > Similarity, etc...), not
  > complete solutions with source code.
  > Anyone implemented a good solution for this problem? (example: my
search
  > application returns about
  > 10-20 pages of 1-2 word hits for "hello", and then it starts to list
the
  > longer texts)
  > I've implemented a very simple solution: I boost documents shorter
than
  > 300 chars with
  > 1/300*doclength at index time. Now it works a lot better. In fact, I
  > can't see any problems now.
  > Anyway, I think this is not "the solution", this is a patch or
  > workaround.
  > So, I'd be interested in some kind of well designed complete
solution
  > for this problem.
  > 
  > Regards,
  > Sanyi
  > 
  > 
  > 
  > __________________________________
  > Do you Yahoo!?
  > Check out the new Yahoo! Front Page.
  > www.yahoo.com
  > 
  > 
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message