lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ambiese...@gmx.de
Subject Re: AW: Probabilistic Model in Lucene - possible?
Date Thu, 04 Dec 2003 11:45:12 GMT
Hello Karsten and Herb,

I would like to join Karsten in the club of sightly confused people
regarding what probabilistic and what vector space model is... :-)  I consistered
TF/IDF and cosine similarity so far as something which belongs to the VSM. The
standard similarity formula for probabilistic model also looks different from
TF/IDF...

Clarification would be appreciated. 

Cheers,
Ralf


> 
> Hi Herb,
> 
> thank you for your insights.
> 
> >>
> but by most accepted definitions, the tf/idf model in Lucene is a 
> probabilistic model. 
> >>
> 
> Can you send some pointers to help me understand that? Are all 
> TF/IDF-variants 
> probabilistic models? If so, what makes any model a non-probabilistic 
> one? 
> If you claim that TF/IDF is probabilistic, then the plain cosine (an 
> extreme 
> form of TF/IDF, with IDF for all terms being considered constant) of VSM 
> would
> also be a probabilistic model.
> 
> >>
> it's got strange normalizations though that doesn't allow comparisons of 
> rank values across queries.
> >>
> 
> Lucene's internal ranking sometimes returns values > 1.0, these are then 
> normalized to 1.0,
> adjusting other rankings accordingly. While I have nothing to say 
> against this - it's a hack,
> but useful - it makes comparing the rank values across queries really 
> difficult. It's like
> using different scales whenever you measure something different, and 
> then you do not tell
> anyone about it. 
> 
> >>
> it isn't terribly hard to make a normalized probabilistic model that 
> allows comparing of document scores across queries and assign a meaning 
> to the score. i've done it.
> >>
> 
> Stop bragging, send us your Similarity implementation :)
> 
> Regards,
> 
> Karsten
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Chong, Herb [mailto:HChong3@bloomberg.com] 
> Gesendet: Mittwoch, 3. Dezember 2003 23:01
> An: Lucene Users List
> Betreff: RE: Probabilistic Model in Lucene - possible?
> 
> 
> i think i am missing the original question, but by most accepted 
> definitions, the tf/idf model in Lucene is a probabilistic model. it's 
> got strange normalizations though that doesn't allow comparisons of rank 
> values across queries.
> 
> it isn't terribly hard to make a normalized probabilistic model that 
> allows comparing of document scores across queries and assign a meaning 
> to the score. i've done it. however, that means abandoning idf and 
> keeping actual term frequencies for each document and document size. 
> once you normalize this way, you can intermingle document scores from 
> different queries and different corpora and make statements about the 
> absolute value of the score. it also leads directly into the discussion 
> we had earlier about interterm correlations and how to handle them 
> properly since the full interterm probabilistic model has as a special 
> case the traditional tf/idf model. interjecting Boolean conditions and 
> boost makes the model much more complicated.
> 
> Herb....
> 
> -----Original Message-----
> From: Karsten Konrad [mailto:Karsten.Konrad@xtramind.com]
> Sent: Wednesday, December 03, 2003 4:51 PM
> To: Lucene Users List
> Subject: AW: Probabilistic Model in Lucene - possible?
> 
> >>
> I would highly appreciate it if the experts here (especially Karsten or
> Chong) look at my idea and tell me if this would be possible.
> >>
> 
> Sorry, I have no idea about how to use a probabilistic approach with 
> Lucene, but if anyone does so, I would like to know, too. 
> 
> I am currently puzzled by a related question: I would like to know if 
> there are any approaches to get a confidence value for relevance 
> rather than a ranking. I.e., it would be nice to have a ranking 
> weight whose value has some kind of semantics such that we could 
> compare results from different queries. Can probabilistic approches 
> do anything like this? 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message