lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3875) Document boost does not work correctly when using multi-valued fields
Date Mon, 24 Sep 2012 23:18:07 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462243#comment-13462243
] 

Hoss Man commented on SOLR-3875:
--------------------------------

Committed revision 1389648. - 4.0

                
> Document boost does not work correctly when using multi-valued fields
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3875
>                 URL: https://issues.apache.org/jira/browse/SOLR-3875
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, update
>    Affects Versions: 4.0-BETA
>            Reporter: Toke Eskildsen
>            Assignee: Hoss Man
>            Priority: Critical
>             Fix For: 4.0, 4.1, 5.0
>
>         Attachments: SOLR-3875.patch
>
>
> In Solr 4 BETA & trunk, document boosts skews the ranking for documents with multi
value fields tremendously. A document boost of 5 combined with 15 values in a multi value
field results in scores above 1,000,000,000, while a boost of 0,5 results in scores below
0,001. The error is not present in Solr 3.6.
> Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder committed
20110827 (@1162347) by Mike McCandless, as part of work done on LUCENE-2308. The problem is
that Lucene multiplies the boosts of multiple instances of the same field when updating the
index.
> The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the score for
the field (docBoost*fieldBoost) and assigning it to the first instance of the field, then
setting the boost to 1.0f and assigning that to subsequent instances of the field. This effectively
assigned docBoost*fieldBoost to the field, regardless of the number of instances.
> The updated DocumentBuilder (see https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup),
used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first instance
of the field. Then it sets fieldBoost = docBoost and continues to assign docBoost*fieldBoost
to subsequent instances. Using the example mentioned above, the generated IndexableFields
will get assigned boosts of 5, 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances
of the same field will have a collective boost of 5*25^14.
> This can be demonstrated with the Solr tutorial example by indexing the sample documents
and adding the document 
> {code:xml}
> <add>
> <doc boost="5">
>   <field name="id">Insane score Example. Score = 10E9 </field>
>   <field name="name">Document boost broken for multivalued fields</field>
>   <field name="manu">Thomas Egense and Toke Eskildsen</field>
>   <field name="manu_id_s">Test</field>
>   <field name="cat">bug</field>
>   <field name="features">insane_boost</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>
>   <field name="features">something else</field>  
> </doc>
> </add>
> {code}
> The _manu_ & _features_-fields gets copied to _text_ and a search for _thomas_ matches
the _text_-field with query explanation
> {code:xml}
> <str name="Insane score Example. Score = 10E10 ">
> 2.44373361E10 = (MATCH) weight(text:thomas in 0) [DefaultSimilarity], result of:
>   2.44373361E10 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = termFreq=1.0
>     3.2512918 = idf(docFreq=3, maxDocs=38)
>     7.5161928E9 = fieldNorm(doc=0)
> </str>
> {code}
> Thomas and I are too pressed for time to attempt a proper patch at the moment, but we
guess that a reversion to the old algorithm of assigning the combined boost to the first instance
and 1.0f to all subsequent instances would work?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message