lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Boosting query - debuging
Date Wed, 13 May 2009 13:23:45 GMT

On May 13, 2009, at 3:04 AM, liat oren wrote:

> Thanks a lot, Grant. Yes, this is the case, it is longer than TTD.
> Can you also explain me Why in finlin, we have the doc 35433 and in  
> TTD, its
> 20?
> Are these the number of dcuments that contain any of the elements  
> exist in
> eaxh word.

My understanding is that 35,433 is the combination of the length of  
the document (the one you are "explaining") plus any boosts that you  
applied and would also factor in any custom similarity.

So, how many tokens are in each of those documents?

>
> So if word TTD contains only 6621468, then 20 is the number of  
> documents
> (words) that contain 6621468?
> I don't think this is the case as I checked and the index doesn;t  
> have 35433
> documents that contain 6621468 or 5265266
>
>
> 2009/5/11 Grant Ingersoll <gsingers@apache.org>
>
>>
>> On May 10, 2009, at 5:59 AM, liat oren wrote:
>>
>>>
>>> The output is the following:
>>> *finlin, score: 19.366615*
>>> 19.366615 = (MATCH) fieldWeight(worlds:6621468^3.0 in 35433),  
>>> product of:
>>> 4.2426405 = (MATCH) btq, product of:
>>>  0.70710677 = tf(phraseFreq=0.5)
>>>  6.0 = scorePayload(...)
>>> 7.3036084 = idf(worlds: 6621468=110)
>>> 0.625 = fieldNorm(field=worlds, doc=35433)
>>>
>>> *TTD, score: 15.493294*
>>> 15.493293 = (MATCH) fieldWeight(worlds:6621468^3.0 in 20), product  
>>> of:
>>> 2.1213202 = (MATCH) btq, product of:
>>>  0.70710677 = tf(phraseFreq=0.5)
>>>  3.0 = scorePayload(...)
>>> 7.3036084 = idf(worlds: 6621468=110)
>>> 1.0 = fieldNorm(field=worlds, doc=20)
>>>
>>> Can anyone explain me the highlighted parts of the score?
>>> I read all the explanations in the api and read a lot of threads  
>>> about the
>>> scoring, but didn't really understand these factors.
>>> Why in finlin, we have the doc 35433 and in TTD, its 20?
>>>
>>>
>>
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html
>>
>> fieldNorm = norm (not sure why the docs aren't consistent)  The  
>> norm takes
>> into account document length and boosts (
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html#formula_norm
>> )
>>
>> The gist of what you are seeing , I believe, is that finlin is a  
>> lot longer
>> than TTD.  Is that the case?
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message