lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <>
Subject RE: Probabilistic Model in Lucene - possible?
Date Wed, 03 Dec 2003 22:00:53 GMT
i think i am missing the original question, but by most accepted definitions, the tf/idf model
in Lucene is a probabilistic model. it's got strange normalizations though that doesn't allow
comparisons of rank values across queries.

it isn't terribly hard to make a normalized probabilistic model that allows comparing of document
scores across queries and assign a meaning to the score. i've done it. however, that means
abandoning idf and keeping actual term frequencies for each document and document size. once
you normalize this way, you can intermingle document scores from different queries and different
corpora and make statements about the absolute value of the score. it also leads directly
into the discussion we had earlier about interterm correlations and how to handle them properly
since the full interterm probabilistic model has as a special case the traditional tf/idf
model. interjecting Boolean conditions and boost makes the model much more complicated.


-----Original Message-----
From: Karsten Konrad []
Sent: Wednesday, December 03, 2003 4:51 PM
To: Lucene Users List
Subject: AW: Probabilistic Model in Lucene - possible?

I would highly appreciate it if the experts here (especially Karsten or
Chong) look at my idea and tell me if this would be possible.

Sorry, I have no idea about how to use a probabilistic approach with 
Lucene, but if anyone does so, I would like to know, too. 

I am currently puzzled by a related question: I would like to know
if there are any approaches to get a confidence value for relevance 
rather than a ranking. I.e., it would be nice to have a ranking 
weight whose value has some kind of semantics such that we could 
compare results from different queries. Can probabilistic approches 
do anything like this? 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message