Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Date: Tue, 08 Jun 2004 11:09:43 -0400
From: Yue Sun <ysun@blueprint.org>
Subject: out of memory while indexing one single file
To: lucene-user@jakarta.apache.org
Message-id: <40C5D6B7.7050908@blueprint.org>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
Content-transfer-encoding: 7BIT
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921
 Netscape/7.0

Hi,

First, I am not sure if I should post my question here, since I am using 
CLucene (C++ port of Lucene) to build indexes. Hope someone here could 
help me.

I am indexing at a solaris machine with 1G memory. I use ram writer and 
fs writer, and write into fs index once a while. Now I am testing to 
index single input files. While testing on files < 50M, the program 
works well. While indexing bigger file, it runs out of 1G memory and 
crashes, whatever I set some parameters such as merge factor and the 
frequency writing to disk. My input files are in ASN.1 format, each with 
nested entries, and each entry with various number of fields. I index 
every outermost entry as a lucene document, and each data field as a 
lucene field. So what is different from others, the number of fields 
indexed in my program is quite big. Some files have more than 1000 
different field names. There is no problem with max file descriptors. 
For those failed, some lucene documents have more than 40,000 field 
pairs (duplicate field names with different values). I think it is the 
reason why memory is consumed vastly. One of the failed cases is with an 
input file size: 66M, and crashes after processing about 3800 documents.

Is there any way to improve the program to use less memory? Any 
suggestion would be apprecited!

Regards,
Yue Sun


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org