Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 2460 invoked from network); 8 Jun 2004 15:10:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 8 Jun 2004 15:10:49 -0000 Received: (qmail 41240 invoked by uid 500); 8 Jun 2004 15:10:50 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 41199 invoked by uid 500); 8 Jun 2004 15:10:49 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 41183 invoked by uid 99); 8 Jun 2004 15:10:49 -0000 Received: from [38.112.109.12] (HELO mail.blueprint.org) (38.112.109.12) by apache.org (qpsmtpd/0.27.1) with ESMTP; Tue, 08 Jun 2004 08:10:49 -0700 Received: from blueprint.org (mail.blueprint.org [38.112.108.69]) by mail.blueprint.org (iPlanet Messaging Server 5.2 Patch 1 (built Aug 19 2002)) with ESMTP id <0HYZ00E98WS78Y@mail.blueprint.org> for lucene-user@jakarta.apache.org; Tue, 08 Jun 2004 11:09:43 -0400 (EDT) Date: Tue, 08 Jun 2004 11:09:43 -0400 From: Yue Sun Subject: out of memory while indexing one single file To: lucene-user@jakarta.apache.org Message-id: <40C5D6B7.7050908@blueprint.org> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, First, I am not sure if I should post my question here, since I am using CLucene (C++ port of Lucene) to build indexes. Hope someone here could help me. I am indexing at a solaris machine with 1G memory. I use ram writer and fs writer, and write into fs index once a while. Now I am testing to index single input files. While testing on files < 50M, the program works well. While indexing bigger file, it runs out of 1G memory and crashes, whatever I set some parameters such as merge factor and the frequency writing to disk. My input files are in ASN.1 format, each with nested entries, and each entry with various number of fields. I index every outermost entry as a lucene document, and each data field as a lucene field. So what is different from others, the number of fields indexed in my program is quite big. Some files have more than 1000 different field names. There is no problem with max file descriptors. For those failed, some lucene documents have more than 40,000 field pairs (duplicate field names with different values). I think it is the reason why memory is consumed vastly. One of the failed cases is with an input file size: 66M, and crashes after processing about 3800 documents. Is there any way to improve the program to use less memory? Any suggestion would be apprecited! Regards, Yue Sun --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org