lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wright, Tim" <Tim.Wri...@informa.com>
Subject Preventing short documents from being boosted
Date Fri, 08 Sep 2006 09:57:21 GMT
Hi all, 

We have an issue where around 10-20% of our documents are much shorter
(only a paragraph or so of text) than all the rest. Because Lucene
considers document length when indexing, most of the time these shorter
documents end up being scored higher than the longer ones. 

We'd prefer it if we could remove the length factor, or at least reduce
the weight of it so that we returned a mixture of long and short
documents. Is there a simple way of doing this? I've considered applying
a document boost based on length, but I'm not quite sure of the equation
I'd have to use to "counter" the innate prioritisation of short
documents.

Cheers,

Tim.

--------------------------------------------------------------------------------------------------------------------------------------------
The information contained in this email message may be confidential. If you are not the intended
recipient, any use, interference with, disclosure or copying of this material is unauthorised
and prohibited. Although this message and any attachments are believed to be free of viruses,
no responsibility is accepted by Informa for any loss or damage arising in any way from receipt
or use thereof.  Messages to and from the company are monitored for operational reasons and
in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and delete the message
and any attachments.  Further enquiries/returns can be sent to postmaster@informa.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message