lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liat oren <oren.l...@gmail.com>
Subject Re: Boosting query - debuging
Date Thu, 14 May 2009 08:32:25 GMT
No, As I wrote above
For finlin, 6621468 * 6, 5265266 * 12 (I use payload for this)
and TTD - 6621468 * 3 (I use payload for this)
I search for 6621468 * 3 and it and finlin gets a higher score




2009/5/13 Grant Ingersoll <gsingers@apache.org>

>
> On May 13, 2009, at 3:04 AM, liat oren wrote:
>
> Thanks a lot, Grant. Yes, this is the case, it is longer than TTD.
>> Can you also explain me Why in finlin, we have the doc 35433 and in TTD,
>> its
>> 20?
>> Are these the number of dcuments that contain any of the elements exist in
>> eaxh word.
>>
>
> My understanding is that 35,433 is the combination of the length of the
> document (the one you are "explaining") plus any boosts that you applied and
> would also factor in any custom similarity.
>
> So, how many tokens are in each of those documents?
>
>
>
>> So if word TTD contains only 6621468, then 20 is the number of documents
>> (words) that contain 6621468?
>> I don't think this is the case as I checked and the index doesn;t have
>> 35433
>> documents that contain 6621468 or 5265266
>>
>>
>> 2009/5/11 Grant Ingersoll <gsingers@apache.org>
>>
>>
>>> On May 10, 2009, at 5:59 AM, liat oren wrote:
>>>
>>>
>>>> The output is the following:
>>>> *finlin, score: 19.366615*
>>>> 19.366615 = (MATCH) fieldWeight(worlds:6621468^3.0 in 35433), product
>>>> of:
>>>> 4.2426405 = (MATCH) btq, product of:
>>>>  0.70710677 = tf(phraseFreq=0.5)
>>>>  6.0 = scorePayload(...)
>>>> 7.3036084 = idf(worlds: 6621468=110)
>>>> 0.625 = fieldNorm(field=worlds, doc=35433)
>>>>
>>>> *TTD, score: 15.493294*
>>>> 15.493293 = (MATCH) fieldWeight(worlds:6621468^3.0 in 20), product of:
>>>> 2.1213202 = (MATCH) btq, product of:
>>>>  0.70710677 = tf(phraseFreq=0.5)
>>>>  3.0 = scorePayload(...)
>>>> 7.3036084 = idf(worlds: 6621468=110)
>>>> 1.0 = fieldNorm(field=worlds, doc=20)
>>>>
>>>> Can anyone explain me the highlighted parts of the score?
>>>> I read all the explanations in the api and read a lot of threads about
>>>> the
>>>> scoring, but didn't really understand these factors.
>>>> Why in finlin, we have the doc 35433 and in TTD, its 20?
>>>>
>>>>
>>>>
>>>
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html
>>>
>>> fieldNorm = norm (not sure why the docs aren't consistent)  The norm
>>> takes
>>> into account document length and boosts (
>>>
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html#formula_norm
>>> )
>>>
>>> The gist of what you are seeing , I believe, is that finlin is a lot
>>> longer
>>> than TTD.  Is that the case?
>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message