lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ole-Martin Mørk <olemar...@gmail.com>
Subject Re: Help understanding fieldNorm
Date Mon, 05 Oct 2009 09:49:47 GMT
Did another update:
9.707364 = fieldWeight(url:"our super secret url" in 0), product of:
  1.0 = tf(phraseFreq=1.0)
  31.063566 = idf(url: www=7329 host=323 com=7329
article=2458 something=4 something=46 704290075=3)
  0.3125 = fieldNorm(field=url, doc=0)

FieldNorm value is not changed this time. It might be that the index was
really small the first time the document was added. Could that affect the
fieldNorm value?

--
Ole-Martin Mørk


On Mon, Oct 5, 2009 at 11:39 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

>
>
> On Mon, Oct 5, 2009 at 11:31 AM, Ole-Martin Mørk <olemartin@gmail.com>wrote:
>
>> I don't think I changed any boost values, at least not on purpose. I think
>> the reason for the changed document id is that, to my knowledge, an update
>> is a delete and an add.
>
> That is correct it could be caused by an update I just wonder why it got
> the id 0 - it looked weird though.
> The field norm are calculated for doc boost, field boost and lengthNorm so
> if you did not change any boost values compare to the last update / insert
> this looks very strange though.
>
> can you repeat this more than once and see if the fieldnorm change all the
> time?
>
> simon
>
>>
>> The code for my solrj update:
>>
>> public void updateDocument(SolrDocument document) {
>>         SolrServer server = new CommonsHttpSolrServer(SOLR_URL);
>>         SolrInputDocument input = new SolrInputDocument();
>>         Collection<String> fields = document.getFieldNames();
>>         for (String field : fields) {
>>             input.setField(field, document.getFieldValue(field));
>>         }
>>         input.removeField("id"); //is regenerated from the url value
>>         input.removeField("score");
>>         server.add(input);
>>     }
>> --
>> Ole-Martin Mørk
>>
>>
>>
>> On Mon, Oct 5, 2009 at 11:15 AM, Simon Willnauer <
>> simon.willnauer@googlemail.com> wrote:
>>
>>> Did you change any boost values for URL field or document while
>>> reindexing
>>> the document by any chance? Or do you look at different documents - one
>>> is
>>> internal id 0 and other is internal id 22 - this could be the updated one
>>> just curious if that might be the cause?!
>>>
>>> simon
>>>
>>> On Mon, Oct 5, 2009 at 10:21 AM, Ole-Martin Mørk <olemartin@gmail.com
>>> >wrote:
>>>
>>> > Hi. I am trying to understand Lucene's scoring algorithm. We're
>>> > getting some strange results. First we search for a given page by it's
>>> > url. We get this result:
>>> >
>>> > 0.0014793393 = fieldWeight(url:"our super secret url" in 22), product
>>> of:
>>> >  1.0 = tf(phraseFreq=1.0)
>>> >  32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
>>> > something=2 something=44 704290075=1)
>>> >  4.5776367E-5 = fieldNorm(field=url, doc=22)
>>> >
>>> > When this is done, we use solrJ to read and write the document. The
>>> > only change is the title of the document (appends the number 2)
>>> >
>>> > We search again and the fieldNorm is changed significantly:
>>> >
>>> > 9.874598 = fieldWeight(url:"our super secret url" in 0), product of:
>>> >  1.0 = tf(phraseFreq=1.0)
>>> >  31.598713 = idf(url: www=7328 host=322 com=7328 article=2457
>>> > something=3 somthing=45 704290075=2)
>>> >  0.3125 = fieldNorm(field=url, doc=0)
>>> >
>>> > Why does the value of fieldNorm change so much?
>>> >
>>> > Looking forward to your answers.
>>> >
>>> > --
>>> > Ole-Martin Mørk
>>> > http://twitter.com/olemartin
>>> > http://flickr.com/olemartin
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message