lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From G√ľnther Starnberger <...@sysfrog.org>
Subject Re: max_score(multi_valued_field) function?
Date Tue, 02 May 2006 22:28:58 GMT
hello,

> i can think of two possibilities you might be refering to when you say
> "noise" ... one is that the lengthNorm for docs with many variant
> titles causes matches in those titles to not score as well as
> documents with only one title -- this can be dealt with by overriding
> the lengthNorm function so that when used on your title field, it
> returns a constant value (or by index your title Field with
> setOmitNorms(true)) ... this will completley eliminate the length of
> all titles from scoring consideration, but it is an option.

yes - i guess this is more or less what i mean. an example are the two
documents:

1 - with the titles:
"http"
"hypertext transfer protocol"

2 - with the title:
"http tunnel"

when i use multi-valued fields and do a search on "http" the title
score on the second document is higher as there is a match and the
length is shorter. as the first title of the first document would be a
perfect match this one should get the higher score instead.

disabling the length normalization sounds good - while it may not help
to find the more relevant title at least it won't give a bad score to
good titles.

> As far as I can figure, your best bet really is to use a seperate
> field for each title -- but instead of combining the queries into a
> BooleanQuery, use a DisjunctionMaxQuery with a tiebreaker value of
> 0.0f ... the whole purpose of that query is to score documents based
> on the maximum score of a sub query, but you still have to make the
> sub queries your self, and there's no easy way to make a query that
> only searches the first "chunk" of terms from a field.

ok. this could also be a possible solution. i'll evaluate both methods
and use the "better" one.

thanks for the help.
/gst

Mime
View raw message