lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soeren Pekrul <>
Subject Re: Lucene & LSA
Date Thu, 14 Dec 2006 19:16:56 GMT
mariolone wrote:
> They are successful to extract the matrix. 
> But with collections of large documents is not one too much expensive
> solution? 

I have a quite small collection with 14,960 documents and 29,828 unique 
terms. If I remember right it took a few minutes on a normal laptop 
computer to iterate the terms and documents. I stored the matrix in mySQL:

CREATE TABLE term_document_matrix (
	term VARCHAR( 32 ) NOT NULL ,
	document INT NOT NULL ,
	PRIMARY KEY (term, document)

You can see it is not a real matrix just a normal table in the 
relational model. I stored the weights greater than 0 only, so I have 
much less entries than 14,960 x 29,828 = 446,226,880 (in my case 159,407).

> it is possible to extract the matrix from the indexing file? 

I don’t know any API to extract the matrix from the index file directly.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message