lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <>
Subject Re: Strategy for making short documents not bubble to the top?
Date Wed, 29 Jun 2005 20:39:54 GMT

I would use pure span or cover density based ranking algorithm which
do not take document length into consideration. (tweaking whatever
currently in the standard Lucene distribution?)

For example, searching for the keywords "beautiful house", span/cover
ranking will treat a long document and a short document the same
ranking as long as they have the same number of spans/covers (for
example, "beautiful xxxxxx house" is one cover), and with each
span/cover, the editing distance between the keywords is the same.

Just my 2 cents, 



On 29 Jun 2005 20:30:49 -0000,
<> wrote:
> Hi,
> Short documents bubble to the top of the results because the field
> length is short.  Does anyone have a good strategy for working around this?
>  Will doing something like log(document length) flatten out my results while
> still making them meaningful?  I'm going to try some different approaches
> but any advice is appreciated.
> Thanks.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message