pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: errors with PDPage.convertToImage()
Date Mon, 08 Apr 2013 08:22:02 GMT
Hi Alexander,

you can ignore the info messages if the result you get is inline with your expectations. The
info means that although PDFBox supports a fair amount of the PDF specification not all operators
specified are currently supported. PDFBox handles that situation and continues processing
the rest of the PDF. As long as that doesn't affect the results you are expecting you're fine.

BR
Maruan Sahyoun

Am 08.04.2013 um 10:17 schrieb Alexander Klenner <alexander.garvin.klenner@scai.fraunhofer.de>:

> Hi Andreas,
> 
> sorry I was busy uploading the PDFs and writing the mail, didn't see your mail, but I
figured PDFToImage might be the correct choice here ;). 
> 
> I do not get any exceptions but some info logs, which are:
> 
> Apr 8, 2013 10:16:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: BX
> Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: BDC
> Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: BMC
> Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: i
> Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: DP
> Apr 8, 2013 10:16:51 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: EMC
> Apr 8, 2013 10:16:52 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: EX
> 
> 
> Those I get for every page in this document. 
> 
> Cheers,
> 
> Alex
> 
> --
> Dr. Alexander G. Klenner
> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> Schloss Birlinghoven, D-53754 Sankt Augustin
> Tel.: +49 - 2241 - 14 - 2736
> E-mail: alexander.garvin.klenner@scai.fraunhofer.de
> Internet: http://www.scai.fraunhofer.de
> 
> 
> ----- Original Message -----
> From: "Andreas Lehmkühler" <andreas@lehmi.de>
> To: users@pdfbox.apache.org
> Sent: Monday, April 8, 2013 9:58:25 AM
> Subject: Re: errors with PDPage.convertToImage()
> 
> Hi,
> 
> Maruan Sahyoun <sahyoun@fileaffairs.de> hat am 8. April 2013 um 09:20
> geschrieben:
>> Hi,
>> 
>> unfortunately the attachment didn't make it through.
> Due to some security restrictions.
> 
>> Could you try the PDF in question using the command line app ExtractImage with
>> the -nonSeq  parameter or use the following code
> I guess there is a missunderstanding. Please use PDFToImage to create one image
> for
> each page [1]. Provide us with any possible exception or log.
> 
>> PDDocument pdDoc = PDDocument.loadNonSeq(…)
>> 
>> The NonSequentialParser gives better results if the document has incremental
>> updates.
>> In addition it's not necessary to create a new PDDocument from the cosDoc as
>> parser.getDocument already passes a PDDocument ….
> +1, that's an old pattern and should be used any more.
> 
>> BR from you neighborhood
> I'm not that far away either ;-)
> 
>> Maruan Sahyoun
>> 
>> Am 08.04.2013 um 08:52 schrieb Alexander Klenner
>> <alexander.garvin.klenner@scai.fraunhofer.de>:
>> 
>>> Hi all,
>>> 
>>> I frequently come across PDFs where the convertToImage() method is
>>> generating blank or partly blank images. One of those PDFs is attached to
>>> this mail.
>>> 
>>> My code for processing:
>>> 
>>> PDFParser parser;
>>> parser = new PDFParser(new FileInputStream(f));
>>> parser.parse();
>>> cosDoc = parser.getDocument();
>>> 
>>> pdDoc = new PDDocument(cosDoc);
>>> ..
>>> Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator();
>>> PDPage page = it.next();
>>> ...
>>> PDRectangle cropBox = page.findCropBox();
>>> Dimension dimension = cropBox.createDimension();
>>> ...
>>> BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB,
>>> ImageParser.PARAM_DPI);
>>> 
>>> 
>>> I am using pdfbox-app-1.8.0.jar.
>>> 
>>> So I have two questions:
>>> 
>>> 1. Is there a different way to to extract the page as an image that I am not
>>> aware of to get the correct image?
>>> 2. Or is it possible to detect, that this page was not extracted correctly
>>> before or after the extraction?
>>> 
>>> At the moment I just don't know when dealing with a corrupted image.
>>> 
>>> Thanks a lot for any hints,
>>> 
>>> Alex
>>> 
>>> --
>>> Dr. Alexander G. Klenner
>>> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
>>> Schloss Birlinghoven, D-53754 Sankt Augustin
>>> Tel.: +49 - 2241 - 14 - 2736
>>> E-mail: alexander.garvin.klenner@scai.fraunhofer.de
>>> Internet: http://www.scai.fraunhofer.de
>>> 
> 
> BR
> Andreas Lehmkühler
> 
> [1] http://pdfbox.apache.org/commandlineutilities/PDFToImage.html


Mime
View raw message