pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: PDFParser Conflict Resolution
Date Sat, 22 Feb 2014 16:23:09 GMT

the PDFParser works sequentially throughout the file from top to bottom and collects all objects.
Conflict resolution is done by making the assumption that if an object with the same number
exists later in the file that this should be the correct one.

NonSequentialParser works through the file by looking at the Xref information (table or stream).
This is inline with the PDF specification.

So patching as you’ve done might resolve your issue but might also introduce issues with
other files. The best way would be to find out why NonSequentialParser has issues parsing
your file. If you think it’s a bug please open an issue in jira [https://issues.apache.org/jira/browse/PDFBOX]
and attach the PDF file to together with some sample code.

Maruan Sahyoun

Am 21.02.2014 um 23:47 schrieb Cary L. Schofield <cary.schofield@eesoh.com>:

> I have a signed document that is getting parsed incorrectly.
> Using PDFParser the document form is missing all fields and I can't get to the signature
> Using NonSequentialPDFParser I can get to the signature fields but the signed data appears
to have been corrupted.
> I was able to determine that the form was being replaced or corrupted during conflict
> I solved the problem by patching PDFParser.ConflictObj to ignore an object in the conflict
list when the existing object (from the object pool) is a direct object.
> I know I should do the research, but was hoping someone would already know if the patch
is reasonable or likely to cause more/other problems.
> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message