Sorry to repost  is this the correct forum for questions like this? I
think that writing a new encode/decode operation should help alleviate my
problem, but thought that this must be fairly widespread issue for people
using lucene for "nonwebpage" searches (i.e., shorter documents)
Thanks again,
John
On 4/2/07, John Kleven <johnkleven@gmail.com> wrote:
>
> My documents are cars...
> i.e.,
> Nissan Altima Sports Package
> Nissan Altima Standard
>
> The problem I have is when i search "Nissan Altima", I want to get the 2nd
> hit back first, i.e. "Nissan Altima Standard", because it is shorter.
> However, this doesn't happen. They are both scored the exact same.
>
> I know that the lengthNorm in Similarity is using 1/sqrt(numTerms), and
> you would think that would be enuff to make sure the order is correct.
> However, it is not, and I assume this is because of the encode/decode
> functions that pack this value into a single byte do not have the
> granularity to represent differences between numbers like 1/sqrt(3) vs
> 1/sqrt(4)??
>
> Is the suggested approach here to rewrite the encode/decode operations,
> or is there any easier way?
>
> Thanks kindly 
> John
