lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Madhu Satyanarayana Panitini" <Madhu.Panit...@pass-consulting.com>
Subject RE: VSM in Lucene, again
Date Tue, 06 Sep 2005 07:30:16 GMT
Hi Fredrik,

I have asked question before, Erik Hatcher has give me the link below

     http://www.lucenebook.com/blog/errata/scoring_formula_omission.html

It shows a formula which was not completely implemented.

Regards
Madhu

-----Original Message-----
From: Fredrik Andersson [mailto:fidde.andersson@gmail.com] 
Sent: Monday, September 05, 2005 1:35 PM
To: general@lucene.apache.org
Subject: Re: VSM in Lucene, again

Hi Otis,

Yes, I have looked through that class thoroughly, but all I see is an 
IDF-map lookup with boost functionality. The only thing allowing a query
to 
return a document that is not containing the terms in the query is by
the 
sloppyFreq function. It's more of a semantic trick based on edit
distance, 
so it has nothing to do with the vector angles in a regular vector space

model. The document terms still have to be semantically similar to the
ones 
in the query, which is not the case when matching by vector angles in a
VSM 
(though you often boost documents containing words from the query, 
naturally).

Fredrik

On 9/5/05, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> 
> Hi Fredrik,
> 
> Are you looking for org.apache.lucene.search.DefaultSimilarity ?
> 
> Otis
> 
> --- Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> 
> > Hi folks.
> >
> > I read a transcript from last months digest of this list, in a post
> > by
> > Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my
> > previous
> > work with VSM, it has included matching a query vector towards the
> > documents
> > in the term-document space. I have dissected and customized a lot of
> > classes
> > in the Lucene indexing and searching classes, but I have yet to
> > discover
> > where the actual dot product of the query vector and the document
> > vectors is
> > performed, if Lucene uses this method for information retrieval.
> > Using this
> > method involves a certain angle which you consider as "close", which
> > is a
> > parameter that Lucene would benefit from exposing in its API. This I
> > have
> > not seen any trails of, either. To keep a long story short, a lot of
> > the
> > stuff that I usually associate with VSM and LSI information
retrieval
> > is
> > missing or cleverly hidden.
> >
> > If someone could shed some light on this issue, I would be very
> > thankful.
> > It's probably just that we have different notions of the VSM model,
> > but I'd
> > like to get this straightened out.
> >
> > Greetings,
> > Fredrik
> >
> 
>


Mime
View raw message