lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allistair Crossley <...@roxxor.co.uk>
Subject Re: Same index is ranking differently on 2 machines
Date Wed, 09 Mar 2011 21:38:53 GMT
Thanks. Good to know, but even so my problem remains - the end score should not be different
and is causing a dramatically different ranking of a document (3 versus 7 is dramatic for
my client). This must be down to the scoring debug differences - it's the only difference
I can find :(

On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:

> queryNorm is just a normalizing factor and is the same value across
> all the results for a query, to just make the scores comparable.
> So even if it varies in different environment, you should not worried about.
> 
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
> -
> Defination - queryNorm(q) is just a normalizing factor used to make
> scores between queries comparable. This factor does not affect
> document ranking (since all ranked documents are multiplied by the
> same factor), but rather just attempts to make scores from different
> queries (or even different indexes) comparable
> 
> Regards,
> Jayendra
> 
> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <ali@roxxor.co.uk> wrote:
>> Hi,
>> 
>> I am seeing an issue I do not understand and hope that someone can shed some light
on this. The issue is that for a particular search we are seeing a particular result rank
in position 3 on one machine and position 8 on the production machine. The position 3 is our
desired and roughly expected ranking.
>> 
>> I have a local machine with solr and a version deployed on a production server. My
local machine's solr and the production version are both checked out from our project's SVN
trunk. They are identical files except for the data files (not in SVN) and database connection
settings.
>> 
>> The index is populated exclusively via data import handler queries to a database.
>> 
>> I have exported the production database as-is to my local development machine so
that my local machine and production have access to the self same data.
>> 
>> I execute a total full-import on both.
>> 
>> Still, I see a different position for this document that should surely rank in the
same location, all else being equal.
>> 
>> I ran debugQuery diff to see how the scores were being computed. See appendix at
foot of this email.
>> 
>> As far as I can tell every single query normalisation block of the debug is marginally
different, e.g.
>> 
>> -        0.021368012 = queryNorm (local)
>> +        0.009944122 = queryNorm (production)
>> 
>> Which leads to a final score of -2 versus +1 which is enough to skew the results
from correct to incorrect (in terms of what we expect to see).
>> 
>> - -2.286596 (local)
>> +1.0651637 = (production)
>> 
>> I cannot explain this difference. The database is the same. The configuration is
the same. I have fully imported from scratch on both servers. What am I missing?
>> 
>> Thank you for your time
>> 
>> Allistair
>> 
>> ----- snip
>> 
>> APPENDIX - debugQuery=on DIFF
>> 
>> --- untitled
>> +++ (clipboard)
>> @@ -1,51 +1,49 @@
>> -<str name="L12411p">
>> +<str name="L12411">
>> 
>> -2.286596 = (MATCH) sum of:
>> -  1.6891675 = (MATCH) sum of:
>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>> -          0.1 = boost
>> +1.0651637 = (MATCH) sum of:
>> +  0.7871359 = (MATCH) sum of:
>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>> +        0.05489459 = queryWeight(text:dubai), product of:
>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>           0.25 = fieldNorm(field=text, doc=1551)
>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>           2.0 = boost
>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>           0.375 = fieldNorm(field=profile, doc=1551)
>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>> -          0.1 = boost
>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>> +        0.018402064 = queryWeight(text:product), product of:
>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>           1.0 = tf(termFreq(text:product)=1)
>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>           0.25 = fieldNorm(field=text, doc=1551)
>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>           2.0 = boost
>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>           1.4142135 = tf(termFreq(profile:product)=2)
>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>           0.375 = fieldNorm(field=profile, doc=1551)
>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>         0.5 = boost
>>         11.667155 = idf(profile: dubai=7 product=290)
>> -        0.021368012 = queryNorm
>> +        0.009944122 = queryNorm
>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>         1.0954452 = tf(phraseFreq=1.2)
>>         11.667155 = idf(profile: dubai=7 product=290)
>> 
>> 
>> 
>> 


Mime
View raw message