lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?
Date Tue, 12 Jan 2010 13:27:30 GMT
I'd *strongly* recommend getting a copy of Luke, opening your index
with it and playing around. The "explain" tab will show you a *lot*
about how scoring works......


On Tue, Jan 12, 2010 at 8:16 AM, Paul Taylor <> wrote:

> Benjamin Heilbrunn wrote:
>> This is because matches in short fields (few terms) als typically more
>> pregnant, than matches in long fields (much terms).
>> Imagine the case with two fields named "title" and "content"
>> representing the title and the content of books.
>> If you match three search terms in a five terms title this is a better
>> hit than if you match those three search terms in the content of the
>> book.
>> The length normalization factor is calculated by your Similarity
>> implementation in the method
>> public float lengthNorm(String fieldName, int numTokens)
>> Does that help you?
> Yes, thanks it does I was just getting it, is it base purely on matching a
> field with less terms rather than the percentage of terms in a field
> matched.
> i.e If you match three search terms in a five terms field would this be
> better then if you match those four search terms in a six term field.
> do you know the answer to my second post.
> i.e what does default lengthNorm return for a single term field, (compared
> to if have no NO NORM whereby assume value 1.0)
> Paul
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message