lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Changing ranking
Date Fri, 24 Mar 2006 18:47:34 GMT

(NOTE: replying back to java-user, for the reasons listed at
http://people.apache.org/~hossman/#private_q )

: Date: Fri, 24 Mar 2006 08:42:29 -0000
: Subject: Re: Changing ranking
:
: HI Chris,
: Thanks, so would that make it as simple as a document with 5 matching
: occurences ranks higher than a document with 4 occurences?

Score calculations tend to be complicated, but if what you really care
about is just the number of occurences, then omiting norms is a one way to
start.

: This should achieve my objective of showing slightly longer documents first
: (reallly it doesnt actually have to be the longest, I just want to stop
: documents with onle two words ranking first)

it won't acctually make longer docs appear first -- it will just help
ensure that there is no penalty for a doc being longer.  5 word occurances
in a 10 word document would probably score the same as those 5 words in a
20 word document, the order that they come back might be determined by the
order they were added to the index at that point.  term frequency also
comes into play -- if your BooleanQuery contains 10 optional terms, and
the 4 that apear the least frequently in your index appear in one
document, and the other 6 apear in a differnet document -- the doc with
the 4 rare ones might wind up scoring higher.

To really understand scoring you should do some experiments, and look at
the Explanation information for your queres to understand how things like
tf and idf impact your score.  Then you can think about how you might want
to change your Similarity class to meet your needs.


: >
: > : Is there anyway I can change luicene to rank longer documents with more
: > : phrase occurences higher
: >
: > if what you care about is only the number of occurences, and you don't
: > want the length to be a factor at all, then using Field.setOmitNorms(true)
: > on the Field for every document you add will not only accomplish this, but
: > will also save one byte per field per document in your index.
: >
: > that can add up if you have a lot of fields whose length you don't care
: > about.
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
: >
: >
: >
: > --
: > No virus found in this incoming message.
: > Checked by AVG Free Edition.
: > Version: 7.1.385 / Virus Database: 268.3.0/290 - Release Date: 23/03/2006
: >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message