lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Kleven" <johnkle...@gmail.com>
Subject short documents = help me tweak Similarity??
Date Mon, 02 Apr 2007 18:38:55 GMT
My documents are cars...
i.e.,
Nissan Altima Sports Package
Nissan Altima Standard

The problem I have is when i search "Nissan Altima", I want to get the 2nd
hit back first, i.e. "Nissan Altima Standard", because it is shorter.
However, this doesn't happen.  They are both scored the exact same.

I know that the lengthNorm in Similarity is using 1/sqrt(numTerms), and you
would think that would be enuff to make sure the order is correct.  However,
it is not, and I assume this is because of the encode/decode functions that
pack this value into a single byte do not have the granularity to represent
differences between numbers like 1/sqrt(3) vs 1/sqrt(4)??

Is the suggested approach here to re-write the encode/decode operations, or
is there any easier way?

Thanks kindly -
John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message