lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Why does index boosting a field to 2.0f on a document have such a dramatic effect
Date Thu, 04 Apr 2013 12:07:52 GMT
At index time I boost the alias field of a small set of documents, 
setting the boost to 2.0f, which I thought meant equivalent to doubling 
the score this doc would get over another doc, everything else being equal.

public class ArtistBoostDoc {

     //Double the score of this doc if it comes up in search
     private static float ARTIST_DOC_BOOST = 2.0f;

     private static Set<String> artistGuIdSet = new HashSet<String>();

     static  {

         artistGuIdSet.add("24f1766e-9635-4d58-a4d4-9413f9f98a4c"); //Bach
         artistGuIdSet.add("1f9df192-a621-4f54-8850-2c5373b7eac9"); 
//Beethoven
         artistGuIdSet.add("b972f589-fb0e-474e-b64a-803b0364fa75"); //Mozart
         artistGuIdSet.add("ad79836d-9849-44df-8789-180bbc823f3c"); 
//Vivaldi
         artistGuIdSet.add("27870d47-bb98-42d1-bf2b-c7e972e6befc"); //Handel
         artistGuIdSet.add("8255db36-4902-4cf6-8612-0f2b4288bc9a"); 
//Johann Strauss II
         artistGuIdSet.add("eefd7c1e-abcf-4ccc-ba60-0fd435c9061f"); 
//Richard Wagner
         artistGuIdSet.add("4e60a56a-514a-4a19-a3cc-49927c96b3cb"); 
//Sir Edward Elgar
         artistGuIdSet.add("c130b0fb-5dce-449d-9f40-1437f889f7fe"); 
//Joseph Haydn
         artistGuIdSet.add("f91e3a88-24ee-4563-8963-fab73d2765ed"); 
//Franz Schubert
         artistGuIdSet.add("c70d12a2-24fe-4f83-a6e6-57d84f8efb51"); 
//Johannes Brahms
         artistGuIdSet.add("f1bedf1f-4445-4651-9c35-f4a3f3860a13"); 
//Guiseppe Verdi
     }

     public static void boost(String artistGuid, MbDocument doc) {

         boost(artistGuid,doc.getLuceneDocument());
     }

     public static void boost(String artistGuid, Document doc) {
         if(artistGuIdSet.contains(artistGuid)) {
             for(IndexableField indexablefield:doc.getFields())
             {
if(indexablefield.name().equals(ArtistIndexField.ALIAS.getName()))
                 {
                     Field field = (Field)indexablefield;
                     field.setBoost(ARTIST_DOC_BOOST);
                 }
             }
         }
     }
}

But then when I run this query:

http://search.musicbrainz.org/?type=artist&query=Jean&explain=true

You can see that the first doc (which was indexed boosted) has a 
fieldnorm of 7.5161928 E9 (note the E) compared to 1.0 for the next result.
basically whenever one of these boosted docs is matched on its alias 
field it will always be the first result and once results have been 
normalized it will have a score of 100, and all other results a score of 
zero.

Why is the boosting the field to just 2.0 having such a dramatic effect

Paul





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message