lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: max_score(multi_valued_field) function?
Date Tue, 02 May 2006 22:51:09 GMT
: yes - i guess this is more or less what i mean. an example are the two
: documents:
:
: 1 - with the titles:
: "http"
: "hypertext transfer protocol"
:
: 2 - with the title:
: "http tunnel"
:
: when i use multi-valued fields and do a search on "http" the title
: score on the second document is higher as there is a match and the
: length is shorter. as the first title of the first document would be a
: perfect match this one should get the higher score instead.
:
: disabling the length normalization sounds good - while it may not help
: to find the more relevant title at least it won't give a bad score to
: good titles.

something you could do to gain back the basic idea of length normalization
is to 1) put artificial tokens at the begining and end of each title; 2)
use a high positionIncrimentGap; 3) at query time make all of your queries
Phrase/Span queries that include the artifical begin/end tokens with slop
values 1 less then your positionIncrimentGap.

ie, if you want to search for "http" search for "BEGIN http END"~100

Short titles will get better scores because the begin/end tokens will be
closer together.

It doesn't take care of your max concern though ... a document with the
titles "http clients" and "http 1.1 clients" will still get a higher score
by default then a document with the single title "http"



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message