lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: sumOfSquaredWeights for lengthNorm
Date Tue, 07 Mar 2006 07:18:04 GMT

: > 1) the "boosts" associated with Fields and Documents at indexing time,
: > which are combined with the lengthNorm at index time to determine a single
: > "norm"  value for the doc/field pair.
: I don;t think this is what I want because the lengthNorm is still using
: the # of terms.

You can override the lengthNorm function to return "1.0" for all fields
regardless of length, and then then the norm will consist soley of hte

: Yes, I noticed this, but this is not what I want because its using "idf
: of the terms being queried". What I want fieldWeight to be is to use the
: 1/ sqrt(sumOfSquaredWeights),  where  sumOfSquaredWeights = tf^2 over
: all terms in the field.

Ah, but when you are building your index, how can lucene know what the tf
for all of the terms in the field are? ... you still have more documents
to add that can affect the tf.

if you know what the frequencies are when you add the document, then you
can square that and use it as the field boost and you should have what you

: 3) I got another issue with the explanation, which seems to be a bug.
: Below, I;ve given a printout of the explanation.  There's something
: strange when I use my own Similarity it prints out all query terms
: despite some them not appearing in the doc (See for "formulation" the
: docFreq = 0  but it appears in the explanation).

It looks like your tf function is returning non-zero values when the input
is 0, which is going to give you real weird behavior -- including saying
that certain docs match a clause even when they don't.

Even if you want a tf to return a constant values, you have to keep in
mind the "non-matching" case and reutrn 0 when the input is 0.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message