lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ambiese...@gmx.de
Subject Re: AW: AW: Real Boolean Model in Lucene?
Date Mon, 01 Dec 2003 15:32:49 GMT
Hello Karsten,

that is fine for me. Implementation cannot 100 % be matched to some theory
as the ISO OSI model has perfectly shown. :-) Thats ok for me and I want to
thank you again for the clarification I gained from this conversation.

Cheers

> 
> Hello Ralf,
> 
> >>
> According to your description, Lucene basically maps the boolean query 
> into the vector space and measures the cosine similarity towards other 
> documents in the vector space. If I understood you correctly you mean if 
> a document is found by Lucene based on a boolean query it is relevant 
> (boolean true). If it is not returned, if was boolean false. The score 
> sits on top of it and can be used for ranking. If I would like to use 
> true boolean model I would therefore just need to ignore the score of 
> the Hits document. Did I understand correctly?
> >>
> 
> Yes, I think that this is indeed pretty close to some theoretical 
> foundation: The Boolean Model 
> explains which documents fit to a query, while some appropriate (Lucene 
> is good!) similarity 
> function in vector space yields the ranking.
> 
> Now hell would be the place for me where I would have to prove that 
> Lucene's ranking is 
> exactly equivalent to some transformation of vector space and then using 
> the *cosine* for the 
> ranking. Can't be really, as Lucene sometimes returns results > 1.0 and 
> only some ruthless
> normalisation keeps it within 0.0 to 1.0. In other words, there still 
> are some rough corners
> in Lucene where a good theorist could find some work.
> 
> Could  we leave this topic aside until some suicid.. err, I mean 
> enthusiastic fellow
> tries to work out a really good theory?
> 
> Regards,
> 
> Karsten
> 
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Ralf B [mailto:ambiesense@gmx.de] 
> Gesendet: Montag, 1. Dezember 2003 14:28
> An: Lucene Users List
> Betreff: Re: AW: Real Boolean Model in Lucene?
> 
> 
> Hi Karsten,
> 
> I want to thank you for your qualified answer as well as your answer 
> >from the 14th of November, where you agreed with me that Lucene is 
> basically a VSM implementation. Sometimes it is difficult to make the 
> link between the clear theory and its implementation.
> 
> According to your description, Lucene basically maps the boolean query 
> into the vector space and measures the cosine similarity towards other 
> documents in the vector space. If I understood you correctly you mean if 
> a document is found by Lucene based on a boolean query it is relevant 
> (boolean true). If it is not returned, if was boolean false. The score 
> sits on top of it and can be used for ranking. If I would like to use 
> true boolean model I would therefore just need to ignore the score of 
> the Hits document. Did I understand correctly?
> 
> I aggree that nobody really want to do that. My question intended to 
> find out more about the implemented theory within Lucene.
> 
> Cheers,
> Ralph
> 
> 
> > 
> > Hi,
> > 
> > >>
> > My Question: Does Lucene use TF/IDF for getting this? (which would 
> > mean
> > it does not use the boolean model for the boolean query...)
> > >>
> > 
> > Lucene indeed uses TF/IDF with length normalization for fields and
> > documents. 
> > 
> > However, Lucene is "downward compatible" to the Boolean Model where 
> > documents are represented as 0/1-vectors in Vector Space. Ranking just 
> 
> > adds weights to the elements of the result set, so the underlying 
> > interpretation of a query result can be still that of a 
> > Propositional/Boolean model. If a document appears in the result, its 
> > tokens valuate the query (which actually is a propositional formula 
> > formed over words and phrases) to true. The representation of 
> > documents is more complex in Lucene than required for the Boolean 
> > Model, and as a result, Lucene can efficiently handle phrases and 
> > proximity searches, but these seem to be compatible extensions - if 
> > you can do it in the Boolean Model, you can do it in Lucene :)
> > 
> > One place where Lucene is not 100% compatible with a basic Boolean 
> > Model
> > is that 
> > full negation is a bit tricky - you can not simply ask for all 
> documents 
> > that 
> > do not contain a certain term unless you also have some term that 
> > appears in all 
> > documents. Not a great deal, really. 
> > 
> > If TF/IDF weighting is a problem to you, the Similarity interface
> > implementation allows you 
> > to remove all references to length normalization and document 
> > frequencies.
> > 
> > Regards,
> > 
> > Mit freundlichen Grüßen aus Saarbrücken
> > 
> > --
> > 
> > Dr.-Ing. Karsten Konrad
> > Head of Artificial Intelligence Lab
> > 
> > XtraMind Technologies GmbH
> > Stuhlsatzenhausweg 3
> > D-66123 Saarbrücken
> > Phone: +49 (681) 3025113
> > Fax: +49 (681) 3025109
> > konrad@xtramind.com
> > www.xtramind.com
> > 
> > 
> > 
> > -----Ursprüngliche Nachricht-----
> > Von: ambiesense@gmx.de [mailto:ambiesense@gmx.de]
> > Gesendet: Montag, 1. Dezember 2003 13:11
> > An: lucene-user@jakarta.apache.org
> > Betreff: Real Boolean Model in Lucene?
> > 
> > 
> > Hi,
> > 
> > is it possible to use a real boolean model in lucene for searching. 
> > When
> > one is using the Queryparser with a boolean query (i.e. "dog AND 
> horse") 
> > one does get a list of documents from the Hits object. However these 
> > documents have a ranking (score).
> > 
> > My Question: Does Lucene use TF/IDF for getting this? (which would 
> > mean
> > it does not use the boolean model for the boolean query...)
> > 
> > How can one use a boolean model search, where the outcome are all
> > score=1 ? Example?
> > 
> > Cheers,
> > Ralph
> > 
> > --
> > Neu bei GMX: Preissenkung für MMS-Versand und FreeMMS!
> > 
> > Ideal für alle, die gerne MMS verschicken:
> > 25 FreeMMS/Monat mit GMX TopMail. 
> > http://www.gmx.net/de/cgi/produktemail
> > 
> > +++ GMX - die erste Adresse für Mail, Message, More! +++
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> 
> -- 
> HoHoHo! Seid Ihr auch alle schön brav gewesen?
> 
> GMX Weihnachts-Special: Die 1. Adresse für Weihnachts-
> männer und -frauen! http://www.gmx.net/de/cgi/specialmail
> 
> +++ GMX - die erste Adresse für Mail, Message, More! +++
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
HoHoHo! Seid Ihr auch alle schön brav gewesen?

GMX Weihnachts-Special: Die 1. Adresse für Weihnachts-
männer und -frauen! http://www.gmx.net/de/cgi/specialmail

+++ GMX - die erste Adresse für Mail, Message, More! +++


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message