pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Getting Out of Memory Error when trying to parse and extract text of 8 MB PDF Document
Date Sat, 02 Feb 2013 06:28:03 GMT
Hi,

did you try the non sequentiell parser? PDDocument.loadNonSeq()?

Maruan Sahyoun

Am 02.02.2013 um 07:09 schrieb VIGNESH S <vigneshklncit@gmail.com>:

> Hi Andreas,
> 
> Do you have any suggestion
> 
> On Thu, Jan 31, 2013 at 6:52 PM, Andreas Lehmkühler <andreas@lehmi.de> wrote:
>> Hi,
>> 
>> Am 28.01.13 15:45, schrieb VIGNESH S:
>> 
>>> Hi,
>>> 
>>> Tried extracting Text from a 8MB PDF Document.It is taking more than
>>> 64 MB Heap and gave out of memory when tested on android mobiles..
>>> 
>>> What i understand is PDFBOX is loading all objects in to objectpool
>>> initially,which increases the Heap based on the number of objects in
>>> PDF Document which looks like DOM Way of doing things..
>>> 
>>> Any Alternative memory Efficient SAX way of extracting text in PDFBOX.?.
>> 
>> Try the new nonsequential parser using loadNonSeq() instead of load().
>> 
>> BR
>> Andreas Lehmkühler
> 
> 
> 
> -- 
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

Mime
View raw message