pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Getting Out of Memory Error when trying to parse and extract text of 8 MB PDF Document
Date Thu, 31 Jan 2013 13:22:56 GMT

Am 28.01.13 15:45, schrieb VIGNESH S:
> Hi,
> Tried extracting Text from a 8MB PDF Document.It is taking more than
> 64 MB Heap and gave out of memory when tested on android mobiles..
> What i understand is PDFBOX is loading all objects in to objectpool
> initially,which increases the Heap based on the number of objects in
> PDF Document which looks like DOM Way of doing things..
> Any Alternative memory Efficient SAX way of extracting text in PDFBOX.?.
Try the new nonsequential parser using loadNonSeq() instead of load().

Andreas Lehmkühler

View raw message