pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: problem with pdf eof
Date Thu, 16 Oct 2014 16:12:51 GMT
depends on the parser being used. NonSeq does follow the Xref information and handles multiple
EOFs (incremental updates) when parsing.

BR
Maruan

Am 16.10.2014 um 17:01 schrieb Brzrk One <brzrk1@gmail.com>:

> I've noticed that when there are multiple EOFs in the file, PDFBox parsing
> is less reliable.
> 
> On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <Jan.Vomlel@aipsafe.cz> wrote:
> 
>> When I use load insted of loadNoSeq, signatures are in this case  valid.
>> 
>> But for some documents load function doesnot read complete document. That
>> is why I used loadNoSeq. Some signatures are then missing.
>> 
>> Viz:
>> http://leteckaposta.cz/831516385
>> h1.pdf - original file (signature and timestamp)
>> h2.pdf - add first signature by pdfbox (timestamp is missing)
>> h3.pdf - add second signature by pdfbox (timestamp and previous signature
>> is missing)
>> 
>> Jan
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>> Sent: Thursday, October 16, 2014 2:37 PM
>> To: users@pdfbox.apache.org
>> Subject: Re: problem with pdf eof
>> 
>> when signing please make sure that you load the pdf using PDDocument.load
>> instead of PDDocument.loadNonSeq.
>> 
>> 
>> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>> Sent: Thursday, October 16, 2014 11:55 AM
>>> To: users@pdfbox.apache.org
>>> Subject: Re: problem with pdf eof
>>> 
>>> when you say invalid do you mean it’s corrupted or e.g. you get a
>> warning sign in Adobe Reader? Would you have a sample PDF?
>>> 
>>> When you sign a document and sign it again the first signature points to
>> a different document revision as you have changed the documents content
>> afterwards. So invalid in that context could mean that the warning you
>> might be getting is only reflecting that fact. Would need to see the
>> document to  understand what’s going on.
>>> 
>>> BR
>>> 
>>> Maruan
>>> 
>>> Am 16.10.2014 um 11:48 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>>> 
>>>> Hi Maruan and others,
>>>> 
>>>> I created signature and it seems OK.
>>>> But when I create second signature (loadNonSeq, addSignature,
>> saveIncremental again), the first signature becomes invalid.
>>>> I think that there can be problem, that first page is updated (signatur
>> is invisible), but I dont understand it enough.
>>>> 
>>>> Jan
>>>> 
>>>> -----Original Message-----
>>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>> Sent: Monday, October 13, 2014 4:09 PM
>>>> To: users@pdfbox.apache.org
>>>> Subject: Re: problem with pdf eof
>>>> 
>>>> Hi Jan,
>>>> 
>>>> there are sample in the examples package for various ways to sign a
>> document [1]. Signing a document needs incremental saving.
>>>> 
>>>> OTOH choosing the right solution should not be made on the base if
>> there is a license fee or not.
>>>> 
>>>> Maruan Sahyoun
>>>> 
>>>> [1]
>> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/
>>>> 
>>>> 
>>>> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>>>> 
>>>>> Hi Maruan (and others),
>>>>> 
>>>>> I would like to use pdfbox and bouncycastle for managing pdf
>> signatures. Parsing, validation, timestamping (PADES LTV) .
>>>>> We used itext for it, but it is under commercial licence.
>>>>> Parsing signatures seems to be working (thanks to your advice). So I
>> will try to create timestamp.
>>>>> Is it possible with pdfbox?  I found save method on PDDocument, but
>> Iˇm afraid, that it can change bite representation of pdf, and signatures
>> become invalid. Is it true? What is right way to create signature or
>> timestamp with pdfbox?
>>>>> 
>>>>> Jan
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>>> Sent: Friday, October 10, 2014 10:44 AM
>>>>> To: users@pdfbox.apache.org
>>>>> Subject: Re: problem with pdf eof
>>>>> 
>>>>> Hi Jan,
>>>>> 
>>>>> choosing the right technology is very important so I do understand
>> your concerns. I had to make such decision about using PDFBox in the past
>> too.
>>>>> It can
>>>>> If you have specific issues I can answer I’m happy to try to do so.
As
>> a general statement PDFBox is used in production environments today (as an
>> example we ourselves are using it for a banking customer to process account
>> statements, an airline company to preprocess archiving documents and
>> various other customers).
>>>>> 
>>>>> PDFBox is continuously enhancing the parsing as we try to deal with
>> real world PDF files which are not always inline with the the PDF
>> specification. Currently the best approach is to use PDDocument.loadNonSeq
>> (which parses documents according to the Xref information) and in case of
>> an exception PDDocument.load (which parses sequentially). The Apache Tika
>> project, which uses PDFBox for parsing PDF’s, is running the parsing and
>> text extraction against 50k PDFs being made available via
>> http://digitalcorpora.org
>>>>> 
>>>>> What is the application you would like to be using PDFBox for? Text
>> Extraction, image conversion …. - I might be able to give you more specific
>> information for your use case.
>>>>> 
>>>>> BR
>>>>> 
>>>>> Maruan
>>>>> 
>>>>> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>>>>> 
>>>>>> Thank you Maruan, this function loads document.
>>>>>> 
>>>>>> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance
>> PDF parsing". I think correct parsing is very important, and I have some
>> doubts, if I can use pdfbox in production. Can you say something to rest me
>> :-).
>>>>>> 
>>>>>> Jan
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>>>> Sent: Friday, October 10, 2014 9:25 AM
>>>>>> To: users@pdfbox.apache.or
>>>>>> Subject: Re: problem with pdf eof
>>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> you can try PDDocument.loadNonSeq(InputStream is, null)
>>>>>> 
>>>>>> BR
>>>>>> 
>>>>>> Maruan
>>>>>> 
>>>>>> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>>>>>> 
>>>>>>> Hello,
>>>>>>> I use PDFBox 1.8.7  PDDocument.load(InputStream is) method to
parse
>> PDF document in attachement.
>>>>>>> Method return without exception, but document model is incomplete.
>>>>>>> 
>>>>>>> Problem is in characters after EOF (ofset 22939):
>>>>>>> startxref
>>>>>>> 22449
>>>>>>> %%EOF
>>>>>>> @
>>>>>>> 16 0 obj
>>>>>>> <<
>>>>>>> /Type /Catalog
>>>>>>> 
>>>>>>> PDFBox create internal IOException and ignore it with comment:
>>>>>>>                /*
>>>>>>>                 * PDF files may have random data after the EOF
>> marker. Ignore errors if
>>>>>>>                 * last object processed is EOF.
>>>>>>>                 */
>>>>>>> 
>>>>>>> Is this PDF construction valid?
>>>>>>> Which parser in PDFBox is correct? I tried ConformingPDParser,
but
>> another error occured.
>>>>>>> 
>>>>>>> Jan
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Tento e-mail ani žádný z připojených souborů nejsou přijetím
návrhu
>> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak
>> není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky
>> vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci
a
>> dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah,
>> včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste
>> oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění,
>> reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu
>> všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte
>> to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených
>> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP
>> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně
>> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů
>> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je
>> daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita
>> pracovních aktivit a byla umožněna jejich kontrola..
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message