lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Strategy for making short documents not bubble to the top?
Date Wed, 29 Jun 2005 23:47:03 GMT

I beleve what you want to do is write your own custom Scorer class, and
override lengthNorm.

I've never done this myself, but i'm basing this guess on previous
discussions i've seen regarding this method...

http://www.google.com/search?q=lengthNorm+site%3Amail-archive.com+lucene

: Date: 29 Jun 2005 22:39:06 -0000
: From: yahootintin.11533894@bloglines.com
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Re: Strategy for making short documents not bubble to the top?
:
: Hi Jian,
:
: Thanks for the reply.  The problem with that is it completely
: ignores document length.  A book that mentions "frog" 5 times in its 2,000
: pages should be less relevant than a book that mentions "frog" 4 times in
: its 4 pages.
:
: I really want to lower the document length weight instead
: of removing it completely.  Any ideas how to do that?
:
: Thanks.
:
: --- java-user@lucene.apache.org
: wrote:
: Hi,
: >
: > I would use pure span or cover density based ranking algorithm
: which
: > do not take document length into consideration. (tweaking whatever
:
: > currently in the standard Lucene distribution?)
: >
: > For example, searching
: for the keywords "beautiful house", span/cover
: > ranking will treat a long
: document and a short document the same
: > ranking as long as they have the
: same number of spans/covers (for
: > example, "beautiful xxxxxx house" is one
: cover), and with each
: > span/cover, the editing distance between the keywords
: is the same.
: >
: > Just my 2 cents,
: >
: > Cheers,
: >
: > Jian
: >
: > On
: 29 Jun 2005 20:30:49 -0000, yahootintin.11533894@bloglines.com
: > <yahootintin.11533894@bloglines.com>
: wrote:
: > > Hi,
: > >
: > > Short documents bubble to the top of the results
: because the field
: > > length is short.  Does anyone have a good strategy
: for working around this?
: > >  Will doing something like log(document length)
: flatten out my results while
: > > still making them meaningful?  I'm going
: to try some different approaches
: > > but any advice is appreciated.
: > >
:
: > > Thanks.
: > >
: > > ---------------------------------------------------------------------
:
: > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > >
: For additional commands, e-mail: java-user-help@lucene.apache.org
: > >
: >
: >
: >
: > ---------------------------------------------------------------------
:
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For
: additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message