pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brzrk One <brz...@gmail.com>
Subject Re: problem with pdf eof
Date Thu, 16 Oct 2014 15:01:32 GMT
I've noticed that when there are multiple EOFs in the file, PDFBox parsing
is less reliable.

On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <Jan.Vomlel@aipsafe.cz> wrote:

> When I use load insted of loadNoSeq, signatures are in this case  valid.
>
> But for some documents load function doesnot read complete document. That
> is why I used loadNoSeq. Some signatures are then missing.
>
> Viz:
> http://leteckaposta.cz/831516385
> h1.pdf - original file (signature and timestamp)
> h2.pdf - add first signature by pdfbox (timestamp is missing)
> h3.pdf - add second signature by pdfbox (timestamp and previous signature
> is missing)
>
> Jan
>
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> Sent: Thursday, October 16, 2014 2:37 PM
> To: users@pdfbox.apache.org
> Subject: Re: problem with pdf eof
>
> when signing please make sure that you load the pdf using PDDocument.load
> instead of PDDocument.loadNonSeq.
>
>
> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
>
> >
> >
> > -----Original Message-----
> > From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> > Sent: Thursday, October 16, 2014 11:55 AM
> > To: users@pdfbox.apache.org
> > Subject: Re: problem with pdf eof
> >
> > when you say invalid do you mean it’s corrupted or e.g. you get a
> warning sign in Adobe Reader? Would you have a sample PDF?
> >
> > When you sign a document and sign it again the first signature points to
> a different document revision as you have changed the documents content
> afterwards. So invalid in that context could mean that the warning you
> might be getting is only reflecting that fact. Would need to see the
> document to  understand what’s going on.
> >
> > BR
> >
> > Maruan
> >
> > Am 16.10.2014 um 11:48 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
> >
> >> Hi Maruan and others,
> >>
> >> I created signature and it seems OK.
> >> But when I create second signature (loadNonSeq, addSignature,
> saveIncremental again), the first signature becomes invalid.
> >> I think that there can be problem, that first page is updated (signatur
> is invisible), but I dont understand it enough.
> >>
> >> Jan
> >>
> >> -----Original Message-----
> >> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> >> Sent: Monday, October 13, 2014 4:09 PM
> >> To: users@pdfbox.apache.org
> >> Subject: Re: problem with pdf eof
> >>
> >> Hi Jan,
> >>
> >> there are sample in the examples package for various ways to sign a
> document [1]. Signing a document needs incremental saving.
> >>
> >> OTOH choosing the right solution should not be made on the base if
> there is a license fee or not.
> >>
> >> Maruan Sahyoun
> >>
> >> [1]
> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/
> >>
> >>
> >> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
> >>
> >>> Hi Maruan (and others),
> >>>
> >>> I would like to use pdfbox and bouncycastle for managing pdf
> signatures. Parsing, validation, timestamping (PADES LTV) .
> >>> We used itext for it, but it is under commercial licence.
> >>> Parsing signatures seems to be working (thanks to your advice). So I
> will try to create timestamp.
> >>> Is it possible with pdfbox?  I found save method on PDDocument, but
> Iˇm afraid, that it can change bite representation of pdf, and signatures
> become invalid. Is it true? What is right way to create signature or
> timestamp with pdfbox?
> >>>
> >>> Jan
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> >>> Sent: Friday, October 10, 2014 10:44 AM
> >>> To: users@pdfbox.apache.org
> >>> Subject: Re: problem with pdf eof
> >>>
> >>> Hi Jan,
> >>>
> >>> choosing the right technology is very important so I do understand
> your concerns. I had to make such decision about using PDFBox in the past
> too.
> >>> It can
> >>> If you have specific issues I can answer I’m happy to try to do so. As
> a general statement PDFBox is used in production environments today (as an
> example we ourselves are using it for a banking customer to process account
> statements, an airline company to preprocess archiving documents and
> various other customers).
> >>>
> >>> PDFBox is continuously enhancing the parsing as we try to deal with
> real world PDF files which are not always inline with the the PDF
> specification. Currently the best approach is to use PDDocument.loadNonSeq
> (which parses documents according to the Xref information) and in case of
> an exception PDDocument.load (which parses sequentially). The Apache Tika
> project, which uses PDFBox for parsing PDF’s, is running the parsing and
> text extraction against 50k PDFs being made available via
> http://digitalcorpora.org
> >>>
> >>> What is the application you would like to be using PDFBox for? Text
> Extraction, image conversion …. - I might be able to give you more specific
> information for your use case.
> >>>
> >>> BR
> >>>
> >>> Maruan
> >>>
> >>> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
> >>>
> >>>> Thank you Maruan, this function loads document.
> >>>>
> >>>> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance
> PDF parsing". I think correct parsing is very important, and I have some
> doubts, if I can use pdfbox in production. Can you say something to rest me
> :-).
> >>>>
> >>>> Jan
> >>>>
> >>>> -----Original Message-----
> >>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
> >>>> Sent: Friday, October 10, 2014 9:25 AM
> >>>> To: users@pdfbox.apache.or
> >>>> Subject: Re: problem with pdf eof
> >>>>
> >>>> Hi
> >>>>
> >>>> you can try PDDocument.loadNonSeq(InputStream is, null)
> >>>>
> >>>> BR
> >>>>
> >>>> Maruan
> >>>>
> >>>> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <Jan.Vomlel@aipsafe.cz>:
> >>>>
> >>>>> Hello,
> >>>>> I use PDFBox 1.8.7  PDDocument.load(InputStream is) method to parse
> PDF document in attachement.
> >>>>> Method return without exception, but document model is incomplete.
> >>>>>
> >>>>> Problem is in characters after EOF (ofset 22939):
> >>>>> startxref
> >>>>> 22449
> >>>>> %%EOF
> >>>>> @
> >>>>> 16 0 obj
> >>>>> <<
> >>>>> /Type /Catalog
> >>>>>
> >>>>> PDFBox create internal IOException and ignore it with comment:
> >>>>>                 /*
> >>>>>                  * PDF files may have random data after the EOF
> marker. Ignore errors if
> >>>>>                  * last object processed is EOF.
> >>>>>                  */
> >>>>>
> >>>>> Is this PDF construction valid?
> >>>>> Which parser in PDFBox is correct? I tried ConformingPDParser, but
> another error occured.
> >>>>>
> >>>>> Jan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Tento e-mail ani žádný z připojených souborů nejsou přijetím
návrhu
> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak
> není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky
> vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a
> dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah,
> včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste
> oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění,
> reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu
> všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte
> to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených
> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP
> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně
> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů
> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je
> daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita
> pracovních aktivit a byla umožněna jejich kontrola..
> >>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message