lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adasal <adam.salt...@gmail.com>
Subject Re: Regarding Lucene and LSI
Date Sat, 08 Oct 2005 12:00:41 GMT
Very interesting.
Actually, reading Meet Lucene Part 2 by Otis Gospondnetic and Eric Hatcher
there is mention of Egothor and MG4J. Egothor. What is not mentioned is
Carrot" which, I think, originally has been used with Egothor but has also
been submitted to Lucene CVS. There is a helpful discussion with links that
can be found in lucene-user Re: Term Weights and Clustering from Feb 2005.
MG4J also looks very interesting. There are good resources about it on
sebastiano Vigna's web site, vigna.dsi.unimi.it <http://vigna.dsi.unimi.it>.
The mention the use of Bloom filters, an aticle about which has been written
by Maciej Ceglowski who is (largely) responsible for the LSI/ContextGraph
implementation at NITLE. Article Using Bloom Filters on
Perl.com<http://Perl.com>.
Bloom filters look a bit like the random index from Sahlgren I have
mentioned.
Much to do!
Adam


 On 10/7/05, Lorenzo Viscanti <lorenzo.viscanti@gmail.com> wrote:
>
> I use my own LSI implementation based on Lucene for text clustering.
> I've done some tests, but I do believe that integrating LSI onto the
> lucene
> search subsystem (i.e. creating something like LSISimilarity) is not an
> easy
> task
>
> I start analyzing the documents using Lucene, and then extract tfidf
> values
> (with lucene again), in order to build a documents/terms matrix. Then I
> use
> an implementation of LSI/SVD to analyze it.
> At this point I think that reassigning the scores back to Lucene documents
> is very difficult; but I'm trying to grab the modified scores from the
> matrix on my LSISImilarity.
> Instead clustering search results this way is not too difficult, I just
> apply the algorithm (mostly HAC-like) to the modified matrix.
> To search using LSI you must choose a small subset of the collection and
> then apply LSI/SVD to it, then extend the matrix by 'folding in' new
> documents. But how to choose the initial subset? Maybe just searching the
> index and then using the first n documents retrieved.
> Any idea?
> Lorenzo
>
> On 10/7/05, Paul Libbrecht <paul@activemath.org> wrote:
> >
> >
> > I've met other persons with such needs and we would also be interested.
> >
> > Unfortunately, this seems not to be available.
> > A clear issue might be that LSI, in its original form at least, is
> > covered by an US patent. But maybe someone finds another form which is
> > not.
> >
> > paul
> >
> >
> > Le 5 oct. 05, à 14:59, <rrshwrk@gmail.com> a écrit :
> > > I am looking for LSI implementation i lucene. Is it available. I
> > > couldnt find it in the website. I searched in the archives but no
> > > help. could some one tell me if it is available or not.
> > >
> > > Could you tell me where can i see to find if there are any Language
> > > processing tools for Indexing and retrieval stuff available
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message