lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adasal <>
Subject Re: Lucene + LSI
Date Tue, 13 Dec 2005 10:53:42 GMT
There seem to be quite a few alternatives around. I would be interested in
comments on the following:-
The work at NITLE <>
using Contextual
Network Search (CNS) a graph-based alternative to LSI.
This work *[PDF]* An Introduction to *Random*
found on Sahlgren's sics site.
The former is advanced and feature rich. It would be complex to integrate
into Lucene and would require more or less starting from scratch. Sahlgren
doesn't seem to have an open source implementation and if there were one it
may have to be rewritten. Nevertheless the key here is that it deals with
the problem of matrix reduction intrinsically in the model and so is less
computationally intensive, I think. It also deals with the problem of adding
and subtracting from the matrix, as I understand it and without a
computational penality.
What do others think? Are Kanerva (the professor who developed these
analytical techniques) and e.g. Sahlgren's methods available for
implementation into an open source project?

On 13/12/05, Dave Kor <> wrote:
> On 12/13/05, Ian Soboroff <> wrote:
> > Paul Libbrecht <> writes:
> >
> > > We're also thinking about implementing something similar to LSI within
> > > ActiveMath which is lucene-powered where both formulae and text
> > > searching would benefit of the latent-semantic-similarity. I've been
> > > refrained of doing "exactly this" at least since LSI is patented. This
> > > might also be a reason why there's no implementation in Lucene's
> > > sandbox.
> > >
> > > Have you looked at other vector-based approaches which are not exactly
> LSI ?
> > > Have you looked at InfoMap NLP ?
> >
> > Look for Thomas Hofmann's "probabilistic LSI", and other recent work
> > which cites it.
> You might also be interested in "Latent Dirichlet Allocation (LDA)" by
> David Blei. In short, it is a more advanced version of "probabilistic
> LSI". I am currently writing some code to dump Lucene documents into a
> file format used by Blei's LDA implementation written in C.
> Regards,
> Dave.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message