pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Canu <andrea.c...@gmail.com>
Subject Re: Invalid signed content from PDSignature
Date Fri, 10 Jun 2016 10:29:39 GMT
Hi Tilman, you are correct!
My file is a zip file which contains three signed PDF documents.

But now I'm in trouble again.

Why PDDocument PDFParser Irecognize
Reading this stream with the two classes PDDocument PDFParser I'm not able
to detect if some "header's junk" are skipped by the parser! In this case,
all PDSignature  extracted from the obtained PDDocument refers to a
byte-range with invalid offsets.
The problem

Is it possible to read the PDF stream in "strict-mode" ? This capability
could be useful to detect if a PDF is not "clean"

Alternatively, the PDDocument class can be provided by a new method that
should return the signed-content for a given PDSignature.


Andrea

p.s
No, the Signature's validation I'm refering to is obtained by a known
commercial library

On Thu, Jun 9, 2016 at 6:24 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Hello Andrea,
>
> I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP file.
> Apparently with an uncompressed PDF in it (yes, ZIP can have uncompressed
> files). Of course one could adjust the offsets, but this wouldn't be right:
> the PDF has been modified, the PK header has been added. Try renaming that
> file and then click on it to confirm my theory that it is really a ZIP file.
>
> (I suspect you'll tell me that it validates with Adobe Reader. If so, then
> I'd say Adobe is wrong. I just tried adding "XXXX" in front of a file with
> NOTEPAD++ and Adobe does not tell that the file was modified.)
>
> The good thing is that there is no bug in COSFilterInputStream (I was
> afraid of that), so I'll use getSignedContent() in the signature example
> instead of the code I have now.
>
> Tilman
>
>
> Am 09.06.2016 um 10:45 schrieb Andrea Canu:
>
>> Hi Tilman
>> thank you for your answer.
>>
>> The PDF is a real document so I can't share it, but I can give you an
>> extract:
>>
>> Those are the first 1044 bytes of the document.
>> --------------------------------------------------------------
>>
>>
>>
>> *PK      ¹Js: ¼àð3£ 3£ <   CAACT-00-00-08 document.pdf*%PDF-1.6
>>
>> %âãÏÓ
>> 3582 0 obj
>> <</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
>> endobj
>>
>> xref
>> 3582 34
>> 0000000016 00000 n
>> 0000003154 00000 n
>> 0000003481 00000 n
>> 0000003680 00000 n
>> 0000004019 00000 n
>> 0000004048 00000 n
>> 0000004265 00000 n
>> 0000004495 00000 n
>> 0000004765 00000 n
>> 0000004950 00000 n
>> 0000006189 00000 n
>> 0000007372 00000 n
>> 0000007629 00000 n
>> 0000060752 00000 n
>> 0000061525 00000 n
>> 0000062245 00000 n
>> 0000062284 00000 n
>> 0000062509 00000 n
>> 0000062740 00000 n
>> 0000062819 00000 n
>> 0000064540 00000 n
>> 0000064945 00000 n
>> 0000065082 00000 n
>> 0000065306 00000 n
>> 0000065606 00000 n
>> 0000072471 00000 n
>> 0000075166 00000 n
>> 0000078960 00000 n
>> 0000079194 00000 n
>> 0000079411 00000 n
>> 0000118645 00000 n
>> 0000118722 00000 n
>> 0000002986 00000 n
>> 0000001000 00000 n
>> trailer
>> <</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
>>
>> R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
>> startxref
>> 0
>> %%EOF
>>
>> .....
>> --------------------------------------------------------------
>>
>> I would to bring your attention to the first 60 bytes.
>> Those bytes are stripped out by the *COSParser *parser, skipped like
>> garbage.
>> The method that skips those bytes  is:
>>
>> COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
>>
>> ....
>> private static final String PDF_HEADER = "%PDF-";
>>
>>
>> I've noticed that I must to manually skip too those 60 bytes from the
>> *pdfInputStream
>> *before to call the method
>>
>> signature.getSignedContent ( *pdfInputStream *)
>>
>>
>> In this way, the returned byte-array digest HASH and the HASH inside
>> signature match.
>>
>>
>> Andrea
>>
>>
>> On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>>>
>>> Hi guys
>>>>
>>>> I want to ask you about the correct way to get the signed-content from
>>>> the
>>>> signature.
>>>> Since now I've used the PDSignature class's method:
>>>>
>>>> signature.getSignedContent ( *pdfInputStream *)
>>>>
>>>> With this method I'm able to extract from the *pdfInputStream *the
>>>> byte-array of the signed-content based on the signature's ByteRange.
>>>>
>>>> I've noticed that if I try to verify the signature based on that
>>>> byte-array, the verification sometime unexpectedly fails!
>>>>
>>>> Hello Andrea,
>>>
>>> Can you share the PDF (upload it)?
>>>
>>> I doubt your theory re: bug in COSParser. I'd rather search if there is a
>>> bug in COSFilterInputStream.
>>>
>>> If you can't share the PDF, then please download the bytes "the hard
>>> way":
>>>
>>>                      // download the signed content, described in
>>> /ByteRange COSArray:
>>>                      // [offset1 len1 offset2 len2]
>>>                      int[] byteRange = sig.getByteRange();
>>>                      byte[] buf = new byte[byteRange[1] + byteRange[3]];
>>>                      RandomAccessFile raf = new RandomAccessFile(infile,
>>> "r");
>>>                      raf.seek(byteRange[0]);
>>>                      raf.readFully(buf, byteRange[0], byteRange[1]);
>>>                      raf.seek(byteRange[2]);
>>>                      raf.readFully(buf, byteRange[1], byteRange[3]);
>>>                      raf.close();
>>>
>>> This code is not fully correct, because /ByteRange might have more than 4
>>> elements. So have a look at it to be sure.
>>>
>>> Then compare the byte array "buf" with the one from getSignedContent.
>>>
>>> Another possibility that it fails might be that there are different
>>> signature methods. See the code at
>>>
>>>
>>> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>>>
>>> I didn't use getsignedContent() there but I think I should. So I'd be
>>> very
>>> interested to find out if there is a bug there.
>>>
>>> Tilman
>>>
>>>
>>> Now, looking at the COSParser class I've found this method :
>>>>
>>>> COSParser.parseHeader
>>>>
>>>>
>>>> This method, trying to find the correct document's header, is able to
>>>> skip
>>>> some garbage in the PDF document looking for the markers "%PDF-" and
>>>> "%FDF-".
>>>>
>>>> So, I've noticed that the signature verification succeed if I skip that
>>>> garbage during the signed-content extraction.
>>>>
>>>> My question is:
>>>> Why this garbage-management is not present also into the
>>>> getSignedContent
>>>> code?
>>>>
>>>> The workaround I found is to skip that garbage manually from the
>>>> *pdfInputStream*, but now the problem is the correct way to calculate
>>>> the
>>>> offset for the *pdfInputStream.*
>>>>
>>>> Any suggestion?
>>>>
>>>> Kinds regards
>>>> Andrea.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message