pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Invalid signed content from PDSignature
Date Thu, 09 Jun 2016 16:24:02 GMT
Hello Andrea,

I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP 
file. Apparently with an uncompressed PDF in it (yes, ZIP can have 
uncompressed files). Of course one could adjust the offsets, but this 
wouldn't be right: the PDF has been modified, the PK header has been 
added. Try renaming that file and then click on it to confirm my theory 
that it is really a ZIP file.

(I suspect you'll tell me that it validates with Adobe Reader. If so, 
then I'd say Adobe is wrong. I just tried adding "XXXX" in front of a 
file with NOTEPAD++ and Adobe does not tell that the file was modified.)

The good thing is that there is no bug in COSFilterInputStream (I was 
afraid of that), so I'll use getSignedContent() in the signature example 
instead of the code I have now.

Tilman


Am 09.06.2016 um 10:45 schrieb Andrea Canu:
> Hi Tilman
> thank you for your answer.
>
> The PDF is a real document so I can't share it, but I can give you an
> extract:
>
> Those are the first 1044 bytes of the document.
> --------------------------------------------------------------
>
>
>
> *PK      ¹Js: ¼àð3£ 3£ <   CAACT-00-00-08 document.pdf*%PDF-1.6
> %âãÏÓ
> 3582 0 obj
> <</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
> endobj
>
> xref
> 3582 34
> 0000000016 00000 n
> 0000003154 00000 n
> 0000003481 00000 n
> 0000003680 00000 n
> 0000004019 00000 n
> 0000004048 00000 n
> 0000004265 00000 n
> 0000004495 00000 n
> 0000004765 00000 n
> 0000004950 00000 n
> 0000006189 00000 n
> 0000007372 00000 n
> 0000007629 00000 n
> 0000060752 00000 n
> 0000061525 00000 n
> 0000062245 00000 n
> 0000062284 00000 n
> 0000062509 00000 n
> 0000062740 00000 n
> 0000062819 00000 n
> 0000064540 00000 n
> 0000064945 00000 n
> 0000065082 00000 n
> 0000065306 00000 n
> 0000065606 00000 n
> 0000072471 00000 n
> 0000075166 00000 n
> 0000078960 00000 n
> 0000079194 00000 n
> 0000079411 00000 n
> 0000118645 00000 n
> 0000118722 00000 n
> 0000002986 00000 n
> 0000001000 00000 n
> trailer
> <</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
> R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
> startxref
> 0
> %%EOF
>
> .....
> --------------------------------------------------------------
>
> I would to bring your attention to the first 60 bytes.
> Those bytes are stripped out by the *COSParser *parser, skipped like
> garbage.
> The method that skips those bytes  is:
>
> COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
>
> ....
> private static final String PDF_HEADER = "%PDF-";
>
>
> I've noticed that I must to manually skip too those 60 bytes from the
> *pdfInputStream
> *before to call the method
>
> signature.getSignedContent ( *pdfInputStream *)
>
> In this way, the returned byte-array digest HASH and the HASH inside
> signature match.
>
>
> Andrea
>
>
> On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>>
>>> Hi guys
>>>
>>> I want to ask you about the correct way to get the signed-content from the
>>> signature.
>>> Since now I've used the PDSignature class's method:
>>>
>>> signature.getSignedContent ( *pdfInputStream *)
>>>
>>> With this method I'm able to extract from the *pdfInputStream *the
>>> byte-array of the signed-content based on the signature's ByteRange.
>>>
>>> I've noticed that if I try to verify the signature based on that
>>> byte-array, the verification sometime unexpectedly fails!
>>>
>> Hello Andrea,
>>
>> Can you share the PDF (upload it)?
>>
>> I doubt your theory re: bug in COSParser. I'd rather search if there is a
>> bug in COSFilterInputStream.
>>
>> If you can't share the PDF, then please download the bytes "the hard way":
>>
>>                      // download the signed content, described in
>> /ByteRange COSArray:
>>                      // [offset1 len1 offset2 len2]
>>                      int[] byteRange = sig.getByteRange();
>>                      byte[] buf = new byte[byteRange[1] + byteRange[3]];
>>                      RandomAccessFile raf = new RandomAccessFile(infile,
>> "r");
>>                      raf.seek(byteRange[0]);
>>                      raf.readFully(buf, byteRange[0], byteRange[1]);
>>                      raf.seek(byteRange[2]);
>>                      raf.readFully(buf, byteRange[1], byteRange[3]);
>>                      raf.close();
>>
>> This code is not fully correct, because /ByteRange might have more than 4
>> elements. So have a look at it to be sure.
>>
>> Then compare the byte array "buf" with the one from getSignedContent.
>>
>> Another possibility that it fails might be that there are different
>> signature methods. See the code at
>>
>> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>>
>> I didn't use getsignedContent() there but I think I should. So I'd be very
>> interested to find out if there is a bug there.
>>
>> Tilman
>>
>>
>>> Now, looking at the COSParser class I've found this method :
>>>
>>> COSParser.parseHeader
>>>
>>>
>>> This method, trying to find the correct document's header, is able to skip
>>> some garbage in the PDF document looking for the markers "%PDF-" and
>>> "%FDF-".
>>>
>>> So, I've noticed that the signature verification succeed if I skip that
>>> garbage during the signed-content extraction.
>>>
>>> My question is:
>>> Why this garbage-management is not present also into the getSignedContent
>>> code?
>>>
>>> The workaround I found is to skip that garbage manually from the
>>> *pdfInputStream*, but now the problem is the correct way to calculate the
>>> offset for the *pdfInputStream.*
>>>
>>> Any suggestion?
>>>
>>> Kinds regards
>>> Andrea.
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message