lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suba Suresh <su...@wolfram.com>
Subject Re: Out of memory error
Date Thu, 13 Jul 2006 15:38:08 GMT
Definitely. Thanks for both the suggestions. Yes it is 300MB.(typo)

suba suresh.

Rob Staveley (Tom) wrote:
> Let us know how you get on. There are a lot of people fighting very similar
> battles on this list. 
> 
> -----Original Message-----
> From: Suba Suresh [mailto:subas@wolfram.com] 
> Sent: 13 July 2006 15:30
> To: java-user@lucene.apache.org
> Subject: Re: Out of memory error
> 
> Thanks.
> 
> I am using the getText(PDDocument) method of the PDFTextStripper. I will try
> the other suggestion.
> 
> suba suresh.
> 
> Rob Staveley (Tom) wrote:
> 
>>If you are using
>>http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get
>>Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large 
>>String and may need a 1G heap.
>>
>>If, however, you are using
>>http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri
>>teText
>>(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a 
>>temporary file, you will not need so much RAM, but you need to use 
>>http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
>>d.html
>>#Field(java.lang.String,%20java.io.Reader) to construct your Lucene 
>>field (rather than 
>>http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
>>d.html 
>>#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum
>>ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)).
>>
>>-----Original Message-----
>>From: Suba Suresh [mailto:subas@wolfram.com]
>>Sent: 13 July 2006 14:55
>>To: java-user@lucene.apache.org
>>Subject: Out of memory error
>>
>>I am indexing different document formats with lucene 1.9. One of the 
>>pdf file I am indexing is 300MG. Whenever the index writer hits that 
>>file it stops the indexing with "Out of Memory" exception. I am using 
>>the pdf box library to index. I have set the following merge factors in my
> 
> code.
> 
>>writer.setMergeFactor(1000);
>>writer.setMaxMergeDocs(9999999);
>>writer.setMaxBufferedDocs(1000);
>>writer.setMaxFieldLength(Integer.MAX_VALUE);
>>
>>I would like any help and suggestions.
>>
>>thanks,
>>suba suresh.
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message