Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 72604 invoked from network); 13 Jul 2006 14:31:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Jul 2006 14:31:04 -0000 Received: (qmail 15147 invoked by uid 500); 13 Jul 2006 14:30:56 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 15128 invoked by uid 500); 13 Jul 2006 14:30:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15117 invoked by uid 99); 13 Jul 2006 14:30:56 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jul 2006 07:30:56 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [140.177.205.37] (HELO webmail.wolfram.com) (140.177.205.37) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jul 2006 07:30:56 -0700 Received: from [10.10.150.44] ([10.10.150.44]) (authenticated bits=0) by webmail.wolfram.com (8.13.6/8.13.4) with ESMTP id k6DEUomc022927 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 13 Jul 2006 09:30:52 -0500 Message-ID: <44B658F6.3040809@wolfram.com> Date: Thu, 13 Jul 2006 10:30:14 -0400 From: Suba Suresh User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Out of memory error References: <20060713142244.42B9D187BF@mail.seseit.com> In-Reply-To: <20060713142244.42B9D187BF@mail.seseit.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Thanks. I am using the getText(PDDocument) method of the PDFTextStripper. I will try the other suggestion. suba suresh. Rob Staveley (Tom) wrote: > If you are using > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o > rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may > need a 1G heap. > > If, however, you are using > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#writeText > (org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a temporary > file, you will not need so much RAM, but you need to use > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html > #Field(java.lang.String,%20java.io.Reader) to construct your Lucene field > (rather than > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html > #Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Fi > eld.Store,%20org.apache.lucene.document.Field.Index)). > > -----Original Message----- > From: Suba Suresh [mailto:subas@wolfram.com] > Sent: 13 July 2006 14:55 > To: java-user@lucene.apache.org > Subject: Out of memory error > > I am indexing different document formats with lucene 1.9. One of the pdf > file I am indexing is 300MG. Whenever the index writer hits that file it > stops the indexing with "Out of Memory" exception. I am using the pdf box > library to index. I have set the following merge factors in my code. > > writer.setMergeFactor(1000); > writer.setMaxMergeDocs(9999999); > writer.setMaxBufferedDocs(1000); > writer.setMaxFieldLength(Integer.MAX_VALUE); > > I would like any help and suggestions. > > thanks, > suba suresh. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org