lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suba Suresh <su...@wolfram.com>
Subject Re: Out of memory error
Date Wed, 26 Jul 2006 17:14:54 GMT
Sorry for my late response. It took us some time to run it again. We 
increased the memory heap to 1G as you suggested and it works. The 
indexer is not crashing. (We are running into some other problem with a 
powerpoint file .That is for another email).

The code change with 
PDFTextStripper.writeText((org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) 
did not work for us.


Thanks for all the help.

suba suresh.

Rob Staveley (Tom) wrote:
> Let us know how you get on. There are a lot of people fighting very similar
> battles on this list. 
> 
> -----Original Message-----
> From: Suba Suresh [mailto:subas@wolfram.com] 
> Sent: 13 July 2006 15:30
> To: java-user@lucene.apache.org
> Subject: Re: Out of memory error
> 
> Thanks.
> 
> I am using the getText(PDDocument) method of the PDFTextStripper. I will try
> the other suggestion.
> 
> suba suresh.
> 
> Rob Staveley (Tom) wrote:
> 
>>If you are using
>>http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get
>>Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large 
>>String and may need a 1G heap.
>>
>>If, however, you are using
>>http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri
>>teText
>>(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a 
>>temporary file, you will not need so much RAM, but you need to use 
>>http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
>>d.html
>>#Field(java.lang.String,%20java.io.Reader) to construct your Lucene 
>>field (rather than 
>>http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
>>d.html 
>>#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum
>>ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)).
>>
>>-----Original Message-----
>>From: Suba Suresh [mailto:subas@wolfram.com]
>>Sent: 13 July 2006 14:55
>>To: java-user@lucene.apache.org
>>Subject: Out of memory error
>>
>>I am indexing different document formats with lucene 1.9. One of the 
>>pdf file I am indexing is 300MG. Whenever the index writer hits that 
>>file it stops the indexing with "Out of Memory" exception. I am using 
>>the pdf box library to index. I have set the following merge factors in my
> 
> code.
> 
>>writer.setMergeFactor(1000);
>>writer.setMaxMergeDocs(9999999);
>>writer.setMaxBufferedDocs(1000);
>>writer.setMaxFieldLength(Integer.MAX_VALUE);
>>
>>I would like any help and suggestions.
>>
>>thanks,
>>suba suresh.
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message