lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Sun <y...@blueprint.org>
Subject out of memory while indexing one single file
Date Tue, 08 Jun 2004 15:09:43 GMT
Hi,

First, I am not sure if I should post my question here, since I am using 
CLucene (C++ port of Lucene) to build indexes. Hope someone here could 
help me.

I am indexing at a solaris machine with 1G memory. I use ram writer and 
fs writer, and write into fs index once a while. Now I am testing to 
index single input files. While testing on files < 50M, the program 
works well. While indexing bigger file, it runs out of 1G memory and 
crashes, whatever I set some parameters such as merge factor and the 
frequency writing to disk. My input files are in ASN.1 format, each with 
nested entries, and each entry with various number of fields. I index 
every outermost entry as a lucene document, and each data field as a 
lucene field. So what is different from others, the number of fields 
indexed in my program is quite big. Some files have more than 1000 
different field names. There is no problem with max file descriptors. 
For those failed, some lucene documents have more than 40,000 field 
pairs (duplicate field names with different values). I think it is the 
reason why memory is consumed vastly. One of the failed cases is with an 
input file size: 66M, and crashes after processing about 3800 documents.

Is there any way to improve the program to use less memory? Any 
suggestion would be apprecited!

Regards,
Yue Sun


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message