pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Costermans <tim.costerm...@unifiedpost.com>
Subject RE: PDFBox 1.8.4 and pdf's generated by MS Word
Date Mon, 31 Mar 2014 15:00:44 GMT
Hi Muruan,

Thx for pointing out the attachments didn't get through.
2 pdf files and 1 patch file (containing test case to reproduce issue) are available here:
https://www.dropbox.com/sh/291b24dstixowgt/aQTZl5j_pP

Kind regards,
Tim

-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: maandag 31 maart 2014 16:47
To: users@pdfbox.apache.org
Subject: Re: PDFBox 1.8.4 and pdf's generated by MS Word

Hi Tim,

the attachment didn't make it through - could you upload it to a public location?

BR

Maruan

Am 31.03.2014 um 12:56 schrieb Tim Costermans <tim.costermans@unifiedpost.com>:

> Hello,
>  
> I've written a test case to reproduce the issue. (see patch)
> 
> Could someone have a look at it and give me some pointers on how to solve this issue?
I applied this patch on the 1.8.4 tag I checked out locally.
> The issue is that I don't know the pdf spec, so I don't know how to fix this issue in
the PDFBOX source code.
>  
> Word2010.pdf is the input pdf, I open the document with PDFBOX add a string to the pdf.
In this case 'Hello world!'.
> Afterwards I save the pdf.
>  
> If I look at the content of the pdf before and after I modified it (using Notepad++)
I see this:
>  
> Word2010.pdf:
> Line 647: <</Size 18/Root 1 0 R/Info 7 0 
> R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE
> 64>] /Prev 81972/XRefStm 81702>>
>  
> modified_Word2010.pdf:
> Line 791: /XRefStm 81702
>  
> XRefStm is not updated although the original pdf had multiple revisions that were merged
into a new pdf document.
>  
> A third party library we use defends on this XRefStm value and cannot 
> open the pdf after it was modified. (Stack trace see previous msg) Any help would be
much appreciated.
>  
> Kind regards,
>  
> Tim Costermans
>  
> From: Tim Costermans
> Sent: woensdag 26 maart 2014 14:31
> To: 'users@pdfbox.apache.org'
> Subject: PDFBox 1.8.4 and pdf's generated by MS Word
>  
> Hello,
>  
> It' seems that pdf's generated by MS Word 2010 or 2013 are a recipe for trouble in combination
with PDFBOX version 1.8.0 or 1.8.4.
> I upgrade to PDFBOX 1.8.4 and one issue remains:
> 
> Caused by: **thirdparty.pdf.exceptions.PDFParsingException: [offset=91308]Expected numeric
object for object number
>                         at **thirdparty.pdf.exceptions.PDFParsingException.newInstance(PDFParsingException.java:58)
>                         at **thirdparty.pdf.io.PDFParser.throwEx(PDFParser.java:1215)
>                         at **thirdparty.pdf.io.PDFParser.readCompressedCrossRefTable(PDFParser.java:805)
>                         at **thirdparty.pdf.io.PDFParser.readCrossRefTable(PDFParser.java:1175)
>                         at **thirdparty.pdf.PDFDocument.open(PDFDocument.java:154)
>                         at **thirdparty.PDFDocument.open(PDFDocument.java:124)
>                         at com.*****.sign.pdf.PDFPresigner.presign(PDFPresigner.java:24)
>                         ... 26 more
> 
> How to reproduce:
> 1) Fire up MS Word v 2010 , type some text, save as PDF.
> 2) Open this pdf file with Notepad++, you will notice the following at the bottom of
the file:
> ...
> trailer
> <</Size 18/Root 1 0 R/Info 7 0 
> R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE
> 5F>] >> startxref
> 82089
> %%EOF
> xref
> 0 0
> trailer
> <</Size 18/Root 1 0 R/Info 7 0 
> R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE
> 5F>] /Prev 82089/XRefStm 81819>> startxref
> 82605
> %%EOF
>  
> Our application is trying to add an image to this pdf using PDFBox, when calling PDFDocument.save()
the "revisions" are merged an a new pdf is being created.
> The newly created pdf is being passed to a third party that tries to open it, but it
fails because XRefStm is not correctly updated during save.
> Probably related to https://issues.apache.org/jira/browse/PDFBOX-1822
>  
> I also tried PDFDocument.incrementalSave() but then I get into a nullpointer exception
cuased by  PDFXRefStream:  List<Integer> indexEntry = getIndexEntry(); containing two
null objects. (first and last still being null and being added to the list).
> How do I solve this issue?
> What's the real issue here?
> I'm not in control of the pdf's that the application can receive.
>  
> Also ran into the following bug but worked around it https://issues.apache.org/jira/browse/PDFBOX-1838
.


Mime
View raw message