pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: FYI: Workaround for incorrect XRef/XRefStm input
Date Wed, 23 Nov 2016 19:13:39 GMT
Please try current version 1.8.12, maybe it is fixed there (I see no 
"streamOffset != prev" anywhere - maybe you mean something else?). If 
not, look whether it is fixed in the version on svn,
https://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/NonSequentialPDFParser.java?view=markup
and if not, please open an issue in JIRA, preferably with a diff.

Tilman


Am 21.11.2016 um 23:44 schrieb Brzrk One:
> ewps... left out that it was pdfbox 1.8.9...
>
> On Mon, Nov 21, 2016 at 5:12 PM, Brzrk One <brzrk1@gmail.com> wrote:
>
>> I have a PDF file (which I cannot share) with the trailer:
>>
>> trailer
>> <<
>> /Size 16922
>> /Root 1 0 R
>> /Info 9 0 R
>> /ID [<495BB8DD62106B9AB4E6E1C8B591C982> <91EB7F87537B4838AF45C0D28A9882
>> 80>]
>> /XRefStm 5347791
>> startxref
>> 5135270
>>
>> But there is only a single xref table in this pdf file: there is no object
>> with /Type /XRef.
>> In this situation, NonSequentialPDFParser.parseXref() will enter the
>> XREF_STM paragraph, but, since there is no object with /Type /XRef at
>> offset 5347791 (a position that lands smack dab in the middle of the xref
>> table) it does a brute force search for some XRef entry, and returns offset
>> 5135270, which is the location of the one and only xref table in the file.
>>
>> I added this check to the XREF_STM paragraph, which seems to get around
>> the problem:
>>
>>
>> *if* ( streamOffset != prev ) {
>>
>> // if the positions are the same, this a hybrid *xref* table / *xrefstm*
>> but no /XRef stream...
>> parseXrefObjStream(prev, *false*);
>>
>> }
>>
>>
>>   I see similar code in 2.0.3 COSParser.parseXref().
>>   HtH, Pat
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message