lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?
Date Tue, 12 Jan 2010 13:16:57 GMT
Benjamin Heilbrunn wrote:
> This is because matches in short fields (few terms) als typically more
> pregnant, than matches in long fields (much terms).
>
> Imagine the case with two fields named "title" and "content"
> representing the title and the content of books.
> If you match three search terms in a five terms title this is a better
> hit than if you match those three search terms in the content of the
> book.
>
> The length normalization factor is calculated by your Similarity
> implementation in the method
> public float lengthNorm(String fieldName, int numTokens)
>
> Does that help you?
>
>   

Yes, thanks it does I was just getting it, is it base purely on matching 
a field with less terms rather than the percentage of terms in a field 
matched.
i.e If you match three search terms in a five terms field would this be 
better then if you match those four search terms in a six term field.


do you know the answer to my second post.
 i.e what does default lengthNorm return for a single term field, 
(compared to if have no NO NORM whereby assume value 1.0)

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message