lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarjei Lægreid <>
Subject Re: Lucene and Latent Semantic Indexing
Date Tue, 22 Nov 2005 12:26:50 GMT
Hi Andy,

I am also very interested in such approaches. I have tried a hack to
simulate the effects of LSI in a Lucene index. What I did was, as you
suggested to extract the term frequencies from the index, constructed a
term/document matrix, and performed SVD on the matrix. Then I multiplied the
resulting values by a constant factor to simulate term frequencies in the
LSI space (that is, I created a new field "lsi" in the documents and added
the words with their corresponding frequencies). However this is a pretty
nasty hack, and I would appreciate if anyone knows a good way of applying
LSI to Lucene.

Are there any plans of including LSI as a Lucene feature in the future?


On 11/15/05, Andy Liu <> wrote:
> I'm currently experimenting with latent semantic indexing techniques and
> Lucene. I need to extract term frequencies from a Lucene index and
> construct
> a document/term matrix, then subsequently perform some mathematical
> algorithms on this matrix which produces float and potentially negative
> term
> frequency values. Extracting the tf's from the Lucene index is easy. The
> hard part is importing the modified tf's back into the index, since in
> Lucene, tf's are stored as integer values.
> Anybody that knows the Lucene codebase well have any tips? Has anybody
> even
> tried performing LSI on a Lucene index?
> Andy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message