lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cheyenne.lin" <cheyenne....@gmail.com>
Subject Smoothing language model by Lucene
Date Thu, 02 Feb 2012 08:40:52 GMT
I've had an old implementation Lucene-lm by ilps, which is a good start.
However, that implementation doesn't include smooth algorithm. And I found
it particularly hard to re-write the core scoring mechanism to enable
smooth.

(Background: In language model, smoothing strategy adds a little constant
weight to documents with zero query frequency. Of course it doesn't change
anything for one keyword, but consider the case of multiple-keyword query,
when one document is strongly relevant to a few distinguishing keywords,
smoothing may be important) 

In the lucene framework for a multiple-keyword query (say, the simplest
unigram, non-positional query), the following procedure happens, as my
understanding:

1)QueryParser parse query string to BooleanQuery.clauses (weights)
2)(The corresponding scorer of BooleanQuery ) merges all document scores for
each clause
3) but the problem is: each clause's termdocs only contains inversed index
of clause, thus make smoothing strategy impossible, because the document
won't be scored by each query term.

What can I do about that? What class should I concentrate on?

--
View this message in context: http://lucene.472066.n3.nabble.com/Smoothing-language-model-by-Lucene-tp3709311p3709311.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message