lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cheyenne.lin" <>
Subject Smoothing language model by Lucene
Date Thu, 02 Feb 2012 08:40:52 GMT
I've had an old implementation Lucene-lm by ilps, which is a good start.
However, that implementation doesn't include smooth algorithm. And I found
it particularly hard to re-write the core scoring mechanism to enable

(Background: In language model, smoothing strategy adds a little constant
weight to documents with zero query frequency. Of course it doesn't change
anything for one keyword, but consider the case of multiple-keyword query,
when one document is strongly relevant to a few distinguishing keywords,
smoothing may be important) 

In the lucene framework for a multiple-keyword query (say, the simplest
unigram, non-positional query), the following procedure happens, as my

1)QueryParser parse query string to BooleanQuery.clauses (weights)
2)(The corresponding scorer of BooleanQuery ) merges all document scores for
each clause
3) but the problem is: each clause's termdocs only contains inversed index
of clause, thus make smoothing strategy impossible, because the document
won't be scored by each query term.

What can I do about that? What class should I concentrate on?

View this message in context:
Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message