lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Help understanding fieldNorm
Date Mon, 05 Oct 2009 12:04:26 GMT
Could it be that the tokenization schema for URL have changed between  
the times you added documents? I.e. yielding more tokens when you got  
the low fieldNorm value. Number of documents should not impact the  
fieldnorm, the value is based on number of tokens in the field, field  
and document boost:

document boost * field boost * (1/sqrt(terms in field))

Or perhaps you were using different similarity classes where   
lengthNorm(String,int) differed?

If none of above I'm clueless.


        karl

5 okt 2009 kl. 12.45 skrev Ole-Martin Mørk:

> I did not change the url. The length of the title was increased by  
> 1, from
> 41 to 42 characters.
> --
> Ole-Martin Mørk
>
>
> On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin <karl.wettin@gmail.com>  
> wrote:
>
>> sorry, I ment title.
>>
>> 5 okt 2009 kl. 11.57 skrev Simon Willnauer:
>>
>>
>> Ole-Martin, did you mention that you did not change the URL value  
>> but the
>>> title?
>>>
>>> simon
>>>
>>> On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin <karl.wettin@gmail.com>
>>> wrote:
>>>
>>> Hi Ole-Martin,
>>>>
>>>> how many characters was it in the url in before and after update?
>>>>
>>>>
>>>>  karl
>>>>
>>>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:
>>>>
>>>>
>>>> Hi. I am trying to understand Lucene's scoring algorithm. We're
>>>>
>>>>> getting some strange results. First we search for a given page  
>>>>> by it's
>>>>> url. We get this result:
>>>>>
>>>>> 0.0014793393 = fieldWeight(url:"our super secret url" in 22),  
>>>>> product
>>>>> of:
>>>>> 1.0 = tf(phraseFreq=1.0)
>>>>> 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
>>>>> something=2 something=44 704290075=1)
>>>>> 4.5776367E-5 = fieldNorm(field=url, doc=22)
>>>>>
>>>>> When this is done, we use solrJ to read and write the document.  
>>>>> The
>>>>> only change is the title of the document (appends the number 2)
>>>>>
>>>>> We search again and the fieldNorm is changed significantly:
>>>>>
>>>>> 9.874598 = fieldWeight(url:"our super secret url" in 0), product  
>>>>> of:
>>>>> 1.0 = tf(phraseFreq=1.0)
>>>>> 31.598713 = idf(url: www=7328 host=322 com=7328 article=2457
>>>>> something=3 somthing=45 704290075=2)
>>>>> 0.3125 = fieldNorm(field=url, doc=0)
>>>>>
>>>>> Why does the value of fieldNorm change so much?
>>>>>
>>>>> Looking forward to your answers.
>>>>>
>>>>> --
>>>>> Ole-Martin Mørk
>>>>> http://twitter.com/olemartin
>>>>> http://flickr.com/olemartin
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message