pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qingchao Kong <kqingc...@gmail.com>
Subject Re: some PDF documents will not parse
Date Thu, 22 May 2014 00:55:34 GMT
kameron cole,
Hi, I also suggest you offer the specific parse errors and we may find out
what exactly the type of error is. PS: You say "parse pdfs", I presume it
is extracting text from pdfs, am I right?

Regards,


On Thu, May 22, 2014 at 12:28 AM, kameron cole <kc0olm@gmail.com> wrote:

> We are using PDFBox to parse the documents.  We also used Stellent
> OutsideIn (oracle), which parsed the same documents that PDFBox failed to
> parse.  Unfortunately, we can not share the documents because they are
> confidential.
>
> I agree that parsing is the best test for parsing.  I am looking for a
> shortcut, a kind of pre-test.  or, even better, a PDFBox utility that fixes
> bad docs.
>
>
> On Tue, May 20, 2014 at 3:10 PM, Maruan Sahyoun <sahyoun@fileaffairs.de
> >wrote:
>
> > Hi,
> >
> > the parsing errors are occurring within PDFBox or is it a different
> > application you are using for parsing? What kind of parsing errors do you
> > get? Would you have a sample pdf?
> >
> > For testing a PDF document to make sure that a parser can parse it
> > typically you need to parse it - so …
> >
> > BR
> > Maruan Sahyoun
> >
> > Am 20.05.2014 um 18:43 schrieb kameron cole <kc0olm@gmail.com>:
> >
> > > I get parsing errors on certain PDFs - and this causes my other
> processes
> > > to halt.  I would like to find some kind of PDF testing utility in this
> > > group, so that I can either
> > > 1) test the document before sending it to the parser, and skip it, log
> > it,
> > > for later
> > > or
> > > 2) Find a "fix-it" PDF utility, that would correct the doc, and put it
> > back
> > > in the queue to be parsed.
> >
> >
>
>
> --
> ** -- **
> yours truly,
> kameron
>
> PMA® Certified Pilates Teacher
> RYT, Yoga Alliance <http://www.yogaalliance.org/>
> Kontrology Pilates and Yoga <http://www.kontrology.com> ཀ
> SoBe Violoncello <http://www.sobevc.com/sobevc/Welcome.html>
> <http://www.sobevc.com/sobevc/Welcome.html>♮
>
> -- ** --
>



-- 
Qingchao Kong

Ph.D. Candidate
State Key Laboratory of Management and Control for Complex Systems
Institute of Automation, Chinese Academy of Sciences

No. 95 Zhongguancun East Road
Haidian District, Beijing 100190 China

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message