lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarjei Lægreid <tarj...@gmail.com>
Subject Re: Lucene and Latent Semantic Indexing
Date Tue, 22 Nov 2005 12:26:50 GMT
Hi Andy,

I am also very interested in such approaches. I have tried a hack to
simulate the effects of LSI in a Lucene index. What I did was, as you
suggested to extract the term frequencies from the index, constructed a
term/document matrix, and performed SVD on the matrix. Then I multiplied the
resulting values by a constant factor to simulate term frequencies in the
LSI space (that is, I created a new field "lsi" in the documents and added
the words with their corresponding frequencies). However this is a pretty
nasty hack, and I would appreciate if anyone knows a good way of applying
LSI to Lucene.

Are there any plans of including LSI as a Lucene feature in the future?


Regards,
Tarjei

On 11/15/05, Andy Liu <andyliu1227@gmail.com> wrote:
>
> I'm currently experimenting with latent semantic indexing techniques and
> Lucene. I need to extract term frequencies from a Lucene index and
> construct
> a document/term matrix, then subsequently perform some mathematical
> algorithms on this matrix which produces float and potentially negative
> term
> frequency values. Extracting the tf's from the Lucene index is easy. The
> hard part is importing the modified tf's back into the index, since in
> Lucene, tf's are stored as integer values.
>
> Anybody that knows the Lucene codebase well have any tips? Has anybody
> even
> tried performing LSI on a Lucene index?
>
> Andy
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message