pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brzrk One <brz...@gmail.com>
Subject Re: problem with pdf eof
Date Thu, 16 Oct 2014 18:13:37 GMT
I hear dual advice here...
- don't use NonSeq for signatures
- but use NonSeq for multiple EOFs
Files with both multiple EOFs and signatures will have problems...
unless you mean we should parse 2x?

On Thu, Oct 16, 2014 at 12:12 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
wrote:

> depends on the parser being used. NonSeq does follow the Xref information
> and handles multiple EOFs (incremental updates) when parsing.
>
> BR
> Maruan
>
> Am 16.10.2014 um 17:01 schrieb Brzrk One <brzrk1@gmail.com>:
>
> I've noticed that when there are multiple EOFs in the file, PDFBox parsing
> is less reliable.
>
>
> On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <Jan.Vomlel@aipsafe.cz> wrote:
>
> When I use load insted of loadNoSeq, signatures are in this case  valid.
>
> But for some documents load function doesnot read complete document. That
> is why I used loadNoSeq. Some signatures are then missing.
>
> Viz:
> http://leteckaposta.cz/831516385
> h1.pdf - original file (signature and timestamp)
> h2.pdf - add first signature by pdfbox (timestamp is missing)
> h3.pdf - add second signature by pdfbox (timestamp and previous signature
> is missing)
>
> Jan
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Thursday, October 16, 2014 2:37 PM
> To: users@pdfbox.apache.org
> Subject: Re: problem with pdf eof
>
> when signing please make sure that you load the pdf using PDDocument.load
> instead of PDDocument.loadNonSeq.
>
>
> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
>
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Thursday, October 16, 2014 11:55 AM
> To: users@pdfbox.apache.org
> Subject: Re: problem with pdf eof
>
> when you say invalid do you mean it’s corrupted or e.g. you get a
>
> warning sign in Adobe Reader? Would you have a sample PDF?
>
>
> When you sign a document and sign it again the first signature points to
>
> a different document revision as you have changed the documents content
> afterwards. So invalid in that context could mean that the warning you
> might be getting is only reflecting that fact. Would need to see the
> document to  understand what’s going on.
>
>
> BR
>
> Maruan
>
> Am 16.10.2014 um 11:48 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
> Hi Maruan and others,
>
> I created signature and it seems OK.
> But when I create second signature (loadNonSeq, addSignature,
>
> saveIncremental again), the first signature becomes invalid.
>
> I think that there can be problem, that first page is updated (signatur
>
> is invisible), but I dont understand it enough.
>
>
> Jan
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Monday, October 13, 2014 4:09 PM
> To: users@pdfbox.apache.org
> Subject: Re: problem with pdf eof
>
> Hi Jan,
>
> there are sample in the examples package for various ways to sign a
>
> document [1]. Signing a document needs incremental saving.
>
>
> OTOH choosing the right solution should not be made on the base if
>
> there is a license fee or not.
>
>
> Maruan Sahyoun
>
> [1]
>
>
> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/
>
>
>
> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
> Hi Maruan (and others),
>
> I would like to use pdfbox and bouncycastle for managing pdf
>
> signatures. Parsing, validation, timestamping (PADES LTV) .
>
> We used itext for it, but it is under commercial licence.
> Parsing signatures seems to be working (thanks to your advice). So I
>
> will try to create timestamp.
>
> Is it possible with pdfbox?  I found save method on PDDocument, but
>
> Iˇm afraid, that it can change bite representation of pdf, and signatures
> become invalid. Is it true? What is right way to create signature or
> timestamp with pdfbox?
>
>
> Jan
>
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Friday, October 10, 2014 10:44 AM
> To: users@pdfbox.apache.org
> Subject: Re: problem with pdf eof
>
> Hi Jan,
>
> choosing the right technology is very important so I do understand
>
> your concerns. I had to make such decision about using PDFBox in the past
> too.
>
> It can
> If you have specific issues I can answer I’m happy to try to do so. As
>
> a general statement PDFBox is used in production environments today (as an
> example we ourselves are using it for a banking customer to process account
> statements, an airline company to preprocess archiving documents and
> various other customers).
>
>
> PDFBox is continuously enhancing the parsing as we try to deal with
>
> real world PDF files which are not always inline with the the PDF
> specification. Currently the best approach is to use PDDocument.loadNonSeq
> (which parses documents according to the Xref information) and in case of
> an exception PDDocument.load (which parses sequentially). The Apache Tika
> project, which uses PDFBox for parsing PDF’s, is running the parsing and
> text extraction against 50k PDFs being made available via
> http://digitalcorpora.org
>
>
> What is the application you would like to be using PDFBox for? Text
>
> Extraction, image conversion …. - I might be able to give you more specific
> information for your use case.
>
>
> BR
>
> Maruan
>
> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
> Thank you Maruan, this function loads document.
>
> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance
>
> PDF parsing". I think correct parsing is very important, and I have some
> doubts, if I can use pdfbox in production. Can you say something to rest me
> :-).
>
>
> Jan
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Friday, October 10, 2014 9:25 AM
> To: users@pdfbox.apache.or
> Subject: Re: problem with pdf eof
>
> Hi
>
> you can try PDDocument.loadNonSeq(InputStream is, null)
>
> BR
>
> Maruan
>
> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
> Hello,
> I use PDFBox 1.8.7  PDDocument.load(InputStream is) method to parse
>
> PDF document in attachement.
>
> Method return without exception, but document model is incomplete.
>
> Problem is in characters after EOF (ofset 22939):
> startxref
> 22449
> %%EOF
> @
> 16 0 obj
> <<
> /Type /Catalog
>
> PDFBox create internal IOException and ignore it with comment:
>                /*
>                 * PDF files may have random data after the EOF
>
> marker. Ignore errors if
>
>                 * last object processed is EOF.
>                 */
>
> Is this PDF construction valid?
> Which parser in PDFBox is correct? I tried ConformingPDParser, but
>
> another error occured.
>
>
> Jan
>
>
>
>
> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu
>
> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak
> není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky
> vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a
> dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah,
> včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste
> oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění,
> reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu
> všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte
> to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených
> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP
> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně
> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů
> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je
> daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita
> pracovních aktivit a byla umožněna jejich kontrola..
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message