lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Boost Document
Date Sun, 19 Nov 2006 20:34:07 GMT

: with scoring.  Each time you index a document, it get a doc id greater than
: any already in the index, and they get reassigned if you delete docs and
: optimize.... They *may* be used when scoring to break ties but that doesn't
: do you any good.......

they aren't really used to break ties -- that implies there is code
somewhere that deliberately wants the order to be deterministic and when
it sees two identicle scores, then does a sort on docid.  In reality, the
fact that docs come back in docid order is just a side effect of the fact
that the Scorers iterate over documents in order and that the sorting code
generates a "stable sort"

Looking at why the scores are teh same...

: > ----- DOC 222-----home-
: >   40960.0 = fieldNorm(field=WORD, doc=0)

: > ----- DOC 111-----home-
: > 40960.0 = fieldWeight(WORD:home in 1), product of:
: >   40960.0 = fieldNorm(field=WORD, doc=1)

: > > :             doc1.setBoost(3163);

: > > :             doc2.setBoost(3150);

...norms (which is where field and doc boosts go) get encoded as a single
byte, so they loose a lot of precision, unfortunately the methods used to
translate from float->byte->float aren't subclassable (see
Similarity.encodeNorm) if you write a bit of test code to call those
mehtods on various values and see what you'll get you'll notice that
bigger numbers are more affected then smaller numbers, so you may wnat to
just use smaller boosts (ie: divide by 100) and see if that helps.

alternately: stop using boosts to approach this problem, add these numbers
as a new field, and use the FunctionQuery class from Solr to achieve your
goal (search the list archives for more detailed Discussions of


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message