pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDF pages to PNG files getting out of heap space
Date Sun, 13 Sep 2015 16:11:46 GMT
Am 13.09.2015 um 16:04 schrieb Troy Smith:
> Hi,
>
> The code I sent is a quick example from a larger code base.  In the larger
> code base we do a lot of image manipulation.  For example, the image is
> loaded into memory and the pixels are analzed.  This code handles 70 pages
> without increasing the heapsize.
>
> So, I can increase the heapsize, but I was wondering if there is something
> not working well with pdfbox given we hit the heaps pace limit on the first
> page.

It is by design... the huge image has to be loaded in memory. See 
screenshot below, page 1 is harmless, but page 2 is 12301 * 8296 in RGB 
so that would take at least 306,147,288 bytes, but likely more. Now that 
number is used several times, due to some internal copying by PDFBox and 
by Java itself.



I could also get it to work with 2.2GB. Then I tried PDFToImage and 
resolution 300, this worked with 2.2GB too.

Tilman

>
> Thanks.
>
> Troy
> On Sep 13, 2015 1:29 AM, "Tilman Hausherr" <THausherr@t-online.de> wrote:
>
>> Am 13.09.2015 um 04:54 schrieb Troy Smith:
>>
>>> Hi,
>>>> I'm trying to create an image for each page of a pdf.  I'm getting an out
>>>> of heap space error, usually when creating the image of the second page.
>>>> The code I'm using is below.  I may be doing something braindead, but I'm
>>>> not seeing it.  Any advice or thoughts?
>>>>
>> I'm able to display it with -Xmx4g option in PDFDebugger. Your code may
>> require even more, because you're also creating a 300dpi image.
>>
>> Page 2 is huge. It has an image that has a size of almost 5MB compressed.
>>
>> Tilman
>>
>>
>>>> A link to a .pdf that produces this problem:
>>>>
>>> https://www.dropbox.com/s/osw235wyvqp0kxi/test.pdf?dl=0
>>>
>>>
>>> I'm using pdfbox-app-2.0.0-20150911.224202-1643.jar.
>>>> Best regards,
>>>> Troy
>>>>
>>>>
>>>>
>>>> import java.awt.image.BufferedImage;
>>>> import java.io.File;
>>>> import java.util.ArrayList;
>>>> import java.util.List;
>>>>
>>>> import javax.imageio.ImageIO;
>>>>
>>>> import org.apache.pdfbox.pdmodel.PDDocument;
>>>> import org.apache.pdfbox.pdmodel.PDPage;
>>>> import org.apache.pdfbox.rendering.PDFRenderer;
>>>> import org.apache.pdfbox.rendering.ImageType;
>>>>
>>>>
>>>> public class ImageTest{
>>>> public static void main( String [] args ) throws Exception{
>>>>
>>>> String pdfFile = args[0];
>>>> PDDocument doc;
>>>> String imageDocPrefix = pdfFile.substring(0,pdfFile.length()-4);
>>>> File pdfF = new File(pdfFile);
>>>> BufferedImage bim;
>>>> String name;
>>>> File out;
>>>> try{
>>>> doc = PDDocument.load(pdfF);
>>>> PDFRenderer pdfRenderer = new PDFRenderer(doc);
>>>> for (int page = 0; page < doc.getNumberOfPages(); ++page){
>>>>
>>>> System.out.println("Testing..." + page);
>>>>      bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
>>>> name = imageDocPrefix +"_"+ String.format("%04d",page) +".png";
>>>> out = new File(name);
>>>> ImageIO.write(bim, "PNG", out);
>>>>               }
>>>> doc.close();
>>>> }
>>>> catch(Exception e){
>>>> System.err.println("error writing images from pdf:" + pdfFile + ":" + e);
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


Mime
View raw message