pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Error using icafe4j / (pdfbox/) in SAP PI 7.4
Date Sat, 28 Oct 2017 13:05:11 GMT
See latest update:
https://github.com/dragon66/icafe/issues/63

I'd recommend using a different product. Or do a read after write with 
pixel comparison to check whether all pages have been written and have 
the correct content.

Re your own problem, there should be something in your logfile before 
the exception happened (i.e. another exception).

Tilman

Am 25.10.2017 um 17:25 schrieb Tilman Hausherr:
> Hi,
>
> Please ignore my post/mail from this morning. I've found why the 
> exception you mention could happen and have opened an issue here:
> https://github.com/dragon66/icafe/issues/63
>
> Re the PDF content:
> The first question is whether your PDF files are properly converted to 
> an image. To verify this, save each image into a png file with 
> ImageIO.write(). Either it is good or not. I expect that it is. If 
> not, please upload the PDF files to a sharehoster.
>
> You shouldn't edit in PDF files unless you know what you're doing. 
> What you did with this file can result in weird effects on the fonts.
>
> This imaging tool should be able to work on your image files 
> regardless of the content.
>
> Re: converting into multipage TIFF files, see this "minority" answer 
> by me:
> https://stackoverflow.com/a/31974376/535646
>
> Re: converting images to bitonal, try also this library which I use at 
> work for a purpose similar to yours:
> http://www.jhlabs.com/ip/filters/index.html
> I start with color and then use a combination of filters to get a b/w 
> image.
>
> Tilman
>
> PS: Current PDFBox version is 2.0.7.
>
>
> Am 25.10.2017 um 11:45 schrieb Platthaus, Thomas:
>> Just to give you a short update.
>>
>> Maybe we have found the reason ...
>>
>> In original pdf there are font parts like this:
>>
>> /Subtype /TrueType
>> /BaseFont /ArialUnicodeMS-Regular
>> or
>> /Subtype /TrueType
>> /BaseFont /GMOZQV+ArialUnicodeMS-Regular
>>
>> As the error occurs on unix system we patched the pdf and changed 
>> above parts to something like this:
>>
>> /Subtype /Type1
>> /BaseFont /Helvetica
>>
>> As a result the pdf to tiff conversion is working but outgoing tiff 
>> is ugly in some parts ...
>>
>> So it seems that during conversion the fonts are read/used somehow.
>>
>>
>> 2017-10-25 9:54 GMT+02:00 Platthaus, Thomas 
>> <thomas.platthaus@rhein-ruhr-informatik.de 
>> <mailto:thomas.platthaus@rhein-ruhr-informatik.de>>:
>>
>>     Thx for quick response.
>>
>>     I am sure that the marked line is the correct one as the trace
>>     shows the line where IndexOutOfBounds happens:
>>     <Trace> at com.icafe4j.image.tiff.TIFFTweaker.writeMultipageTIFF
>>     Line 3154</Trace>
>>
>>     And I checked again that there are 3 pages in the document. By the
>>     way, the same functionality is called for other pdf files with
>>     only one page in document and it works fine.
>>
>>     Maybe members from icafe has another idea what is causing the 
>> issue!?
>>
>>     2017-10-25 9:33 GMT+02:00 Tilman Hausherr <THausherr@t-online.de
>>     <mailto:THausherr@t-online.de>>:
>>
>>         Looks like a bug in icafe. I suspect the exception is one line
>>         below the red line, list.get(0) won't work if your PDF has
>>         only 1 page because "list" would be empty.
>>
>>         (cc to your two in case you didn't subscribe properly; please
>>         answer to list only)
>>
>>         Tilman
>>
>>         Am 25.10.2017 um 08:02 schrieb Platthaus, Thomas:
>>>         Dear dragon66,
>>>         Dear pdfbox-Team,
>>>
>>>         we have a problem (txt file with exception attached) using
>>>         icafe4j lib under SAP PI 7.4 (running on Unix-System).
>>>
>>>         At first I would like to describe our project in a short form:
>>>         We have many PI-interfaces getting invoices from different
>>>         customers all over the world. Most of the invoices are
>>>         sent as xml structure. When we have received the xml the
>>>         first step is to convert it to an internal standard format.
>>>         This message in standard format is the basic data for
>>>         building a pdf invoice using pdfbox (currently v2.0.4).
>>>         For some of the interfaces the next step is to create a
>>>         multi-page tiff file for archiving the invoice, here we use the
>>>         lib from icafe4j (v1.1). It runs fine and we do not have any
>>>         problems with the interfaces that are already productive.
>>>
>>>         We have some interfaces where we get a pdf file as incoming
>>>         invoice (e.g. via mail, sftp). We are converting these
>>>         pdf files to multi-page tiff, too. A couple of these are
>>>         running in production environment for many weeks now.
>>>
>>>         But one of the interfaces (just in development) now leads to
>>>         an exception while transforming incoming pdf to tiff.
>>>         The functionality is fine when testing in local environment
>>>         (eclipse, Java 1.6, Windows 7) but fails in SAP PI.
>>>
>>>         At the beginning of the project we had a similar behaviour
>>>         using the standard ImageIO classes. All fine during
>>>         local tests but exceptions in SAP PI: Some font problem and
>>>         ImageIO.getImageReadersByFormatName and
>>>         ImageIO.scanForPlugins didn't work although we tried many
>>>         different ways to implement. As a conclusion we
>>>         then decided to use icafe4j and the problem has been solved -
>>>         till we got another error now :-(
>>>
>>>         So back to our current failure. In PI we call the java class
>>>         BuildAndSaveTIFFByContainer (excerpts from code):
>>>         ....
>>>         // Load PDF File to PDDocument.
>>>         *pDoc* = PDDocument.load(bPdf);                    
//
>>>         *org.apache.pdfbox.pdmodel.PDDocument*
>>>         oTrace.addDebugMessage("BuildAndSaveTIFFByContainer: Write
>>>         Stream to PDDocument.");
>>>         pageCounter = pDoc.getNumberOfPages();
>>> this.setDynamicConfiguration("http://covestro.com/COV/X01-AP-INVOICE-BROKER
>>> <http://covestro.com/COV/X01-AP-INVOICE-BROKER>",
>>>         "PageCounter", "" + pageCounter);
>>>         // Create TIFF from PDF
>>>         CreateMultiTIFFFromPDF tiffFromPdf = new
>>>         CreateMultiTIFFFromPDF();
>>>         bTiff = tiffFromPdf.*createMultipageTIFF(pDoc)*;
>>>         // Close document
>>>         pDoc.close();
>>>         ....
>>>
>>>         // Method createMultipageTIFF() from class 
>>> CreateMultiTIFFFromPDF
>>>         public byte[]*createMultipageTIFF(PDDocument pddPdf)* throws
>>>         IOException {
>>>             pdDocument = pddPdf;
>>>             byte[] retByteArray = null;
>>>             int dpi = 300;
>>>             PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
>>>             BufferedImage[] images = new
>>>         BufferedImage[pdDocument.getNumberOfPages()];
>>>             TIFFOptions tiffOptions = new TIFFOptions();
>>>             ByteArrayOutputStream baos = new ByteArrayOutputStream();
>>>             RandomAccessOutputStream rout = new
>>>         FileCacheRandomAccessOutputStream(baos);
>>>             for (int pageIdx = 0; pageIdx <
>>>         pdDocument.getNumberOfPages(); pageIdx++) {
>>>                 try {
>>>                     BufferedImage imageG =
>>>         pdfRenderer.renderImageWithDPI(pageIdx, dpi, ImageType.GRAY);
>>>                     BufferedImage imageBB = new
>>>         BufferedImage(imageG.getWidth(), imageG.getHeight(),
>>>         BufferedImage.TYPE_BYTE_BINARY);
>>>                     Graphics2D g2d = imageBB.createGraphics();
>>>                     g2d.drawRenderedImage(imageG, null);
>>>                     g2d.dispose();
>>>                     images[pageIdx] = imageBB;
>>>                 } catch (IOException e) {
>>>                     e.printStackTrace();
>>>                 }
>>>             }
>>>             ImageParam.ImageParamBuilder builder =
>>>         ImageParam.getBuilder();
>>>             ImageParam[] param = new ImageParam[1];
>>> tiffOptions.setTiffCompression(Compression.CCITTFAX4);
>>>             tiffOptions.setXResolution(dpi);
>>>             tiffOptions.setYResolution(dpi);
>>>             builder.imageOptions(tiffOptions);
>>>             builder.colorType(ImageColorType.BILEVEL);
>>>             param[0] = builder.build();
>>>         *TIFFTweaker.writeMultipageTIFF(rout, param, images);*
>>>             retByteArray = baos.toByteArray();
>>>             // Close Document
>>>             pdDocument.close();
>>>             rout.close();
>>>             baos.close();
>>>             return retByteArray;
>>>         }
>>>
>>>         The red marked line is getting IndexOutOfBoundsException
>>>         (details in attached PDF_TO_TIFF_ERR.txt).
>>>
>>>         We checked and compared the content of all known pdf files
>>>         for our invoice interfaces (incoming or created by pdfbox)
>>>         and the only difference we see is the following:
>>>
>>>
>>>         ​
>>>         ​There are some barcodes coming with pdf files (screenshot
>>>         below) and in screenshot above you can see Filter [
>>>         /ASCII85Decode /FlateDecode ] ... this is the first time we
>>>         got this
>>>         ASCII85Decode. But we are not sure if this is the reason for
>>>         our problem!? And unfortunately debugging in PI is very
>>>         difficult ...
>>>
>>>
>>>         ​
>>>
>>>         Do you have any idea what could be the reason for
>>>         IndexOutOfBounds here? Or maybe you know something about
>>>         similar issues and their solution?
>>>
>>>         Screenshot TIFFTweaker class from icafe4j 1.1:
>>>
>>>
>>>         ​
>>>
>>>         I would welcome your response.
>>>         Thank you!
>>>
>>>         --
>>>         *Best regards / Mit freundlichen Grüßen
>>>         Thomas Platthaus**
>>>         EAI & SOA Senior Developer***
>>>         ________________________________
>>>
>>>
>>>         Rhein-Ruhr-Informatik GmbH
>>>         Alexanderstraße 50
>>>         45472 Mülheim an der Ruhr
>>>
>>>         Tel +49 208 452358-0
>>>         Mob +49 173 2928075 <tel:+49%20173%202928075>
>>>         Fax +49 208 452358-10 <tel:+49%20208%2045235810>
>>>         Email thomas.platthaus@rhein-ruhr-informatik.de
>>>         <mailto:thomas.platthaus@realcore-it.de>
>>>         Web http://www.rhein-ruhr-informatik.de
>>>         <http://www.realcore-it.de/>
>>>
>>>         Rhein-Ruhr-Informatik GmbH
>>>         Amtsgericht Essen: HRB 24657
>>>         Sitz der Gesellschaft: Essen
>>>         Geschäftsführer: Michael Heim
>>>
>>>
>>> ---------------------------------------------------------------------
>>>         To unsubscribe, e-mail:users-unsubscribe@pdfbox.apache.org
>>>         <mailto:users-unsubscribe@pdfbox.apache.org>
>>>         For additional commands, e-mail:users-help@pdfbox.apache.org
>>>         <mailto:users-help@pdfbox.apache.org>
>>
>>
>>
>>
>>
>>     --
>>     *Best regards / Mit freundlichen Grüßen
>>     Thomas Platthaus**
>>     EAI & SOA Senior Developer***
>>     ________________________________
>>
>>
>>     Rhein-Ruhr-Informatik GmbH
>>     Alexanderstraße 50
>>     45472 Mülheim an der Ruhr
>>
>>     Tel +49 208 452358-0
>>     Mob +49 173 2928075 <tel:+49%20173%202928075>
>>     Fax +49 208 452358-10 <tel:+49%20208%2045235810>
>>     Email thomas.platthaus@rhein-ruhr-informatik.de
>>     <mailto:thomas.platthaus@realcore-it.de>
>>     Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>
>>
>>     Rhein-Ruhr-Informatik GmbH
>>     Amtsgericht Essen: HRB 24657
>>     Sitz der Gesellschaft: Essen
>>     Geschäftsführer: Michael Heim
>>
>>
>>
>>
>> -- 
>>
>> *Best regards / Mit freundlichen Grüßen
>> Thomas Platthaus**
>> EAI & SOA Senior Developer***
>> ________________________________
>>
>>
>> Rhein-Ruhr-Informatik GmbH
>> Alexanderstraße 50
>> 45472 Mülheim an der Ruhr
>>
>> Tel +49 208 452358-0
>> Mob +49 173 2928075
>> Fax +49 208 452358-10
>> Email thomas.platthaus@rhein-ruhr-informatik.de 
>> <mailto:thomas.platthaus@realcore-it.de>
>> Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>
>>
>> Rhein-Ruhr-Informatik GmbH
>> Amtsgericht Essen: HRB 24657
>> Sitz der Gesellschaft: Essen
>> Geschäftsführer: Michael Heim
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message