lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Generating Query for Multiple Clauses in a Single Field
Date Thu, 30 Jul 2009 10:43:12 GMT

> yah, before this i used default lucene...but i dont know
> what end up wrong...some results with only single word matching when to
> the top of the results. 

Hmm. Interesting. It seems that length normalization causing this. Very short documents with
only single word matching getting high score due to length normalization. The documents containing
all of the query terms are probably very long and getting lower score. Lucene punishes long
documents, and favors short documents.

Can you verify/confirm my guess looking at the document lengths of the result set? Also
describes the score computation for document and query.

There is an excellent publication [1] [2] (in section 4.1 and 4.2) about lucene score modification.
SweetSpotSimilarity [3] with the appropriate parameters (steepness, min, and max) can solve
your problem.

Alternatively if your requirement is very important (you don't care about long documents taking
over) then you can try to extend the DefaultSimilarity so that it will ignore the document
length. Just return 1.

public float lengthNorm(String fieldName, int numTerms) {
    return 1.0f;

> This i assumed is due to the score of the result being to
> high. Tat's why i am trying to add additional boost

I don't think there exists such a boosting mechanism.




To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message