pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brzrk One <brz...@gmail.com>
Subject FYI: Workaround for incorrect XRef/XRefStm input
Date Mon, 21 Nov 2016 22:12:06 GMT
I have a PDF file (which I cannot share) with the trailer:

trailer
<<
/Size 16922
/Root 1 0 R
/Info 9 0 R
/ID [<495BB8DD62106B9AB4E6E1C8B591C982> <91EB7F87537B4838AF45C0D28A988280>]
/XRefStm 5347791
>>
startxref
5135270

But there is only a single xref table in this pdf file: there is no object
with /Type /XRef.
In this situation, NonSequentialPDFParser.parseXref() will enter the
XREF_STM paragraph, but, since there is no object with /Type /XRef at
offset 5347791 (a position that lands smack dab in the middle of the xref
table) it does a brute force search for some XRef entry, and returns offset
5135270, which is the location of the one and only xref table in the file.

I added this check to the XREF_STM paragraph, which seems to get around the
problem:


*if* ( streamOffset != prev ) {

// if the positions are the same, this a hybrid *xref* table / *xrefstm*
but no /XRef stream...
parseXrefObjStream(prev, *false*);

}


 I see similar code in 2.0.3 COSParser.parseXref().
 HtH, Pat

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message