pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klenner <alexander.garvin.klen...@scai.fraunhofer.de>
Subject errors with PDPage.convertToImage()
Date Mon, 08 Apr 2013 06:52:06 GMT
Hi all,

I frequently come across PDFs where the convertToImage() method is generating blank or partly
blank images. One of those PDFs is attached to this mail. 

My code for processing: 

PDFParser parser;
parser = new PDFParser(new FileInputStream(f));
cosDoc = parser.getDocument();

pdDoc = new PDDocument(cosDoc);
Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator();
PDPage page = it.next();
PDRectangle cropBox = page.findCropBox();
Dimension dimension = cropBox.createDimension();
BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, ImageParser.PARAM_DPI);

I am using pdfbox-app-1.8.0.jar.

So I have two questions: 

1. Is there a different way to to extract the page as an image that I am not aware of to get
the correct image? 
2. Or is it possible to detect, that this page was not extracted correctly before or after
the extraction?

At the moment I just don't know when dealing with a corrupted image.

Thanks a lot for any hints,


Dr. Alexander G. Klenner
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven, D-53754 Sankt Augustin
Tel.: +49 - 2241 - 14 - 2736
E-mail: alexander.garvin.klenner@scai.fraunhofer.de
Internet: http://www.scai.fraunhofer.de

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message