lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: Need advice: what pdf lib to use?
Date Mon, 25 Oct 2004 12:02:31 GMT

PDFBox does not 'stumble' when it gives that message, that is correct
functionality if that permission is not allowed.

If your company is willing to pay a 'fortune' why not sponsor a change to
an open source project for half a fortune.

Ben
http://www.pdfbox.org

On Mon, 25 Oct 2004 iouli.golovatyi@group.novartis.com wrote:

> PDFbox stumbles also with "class java.io.IOException with message:  - You
> do not have permission to extract text" in case the doc is copy/print
> protected.
> I tested now the snowtide commercial product and it looks like it could
> process these files as well. Performance was also not so bad. Unfortunatly
> the test result could not be considered as 100%, because the free version
> processed just first  8  pages.  After all this product costs a fortune
> (as long the company is ready to pay I don't realy mind:))
>
> J.
>
>
>
>
>
> Robert Newson <rnewson@connected.com>
> Sent by: news <news@sea.gmane.org>
> 24.10.2004 17:44
> Please respond to "Lucene Users List"
>
>
>         To:     lucene-user@jakarta.apache.org
>         cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
>         Subject:        Re: Need advice: what pdf lib to use?
>         Category:
>
>
>
> iouli.golovatyi@group.novartis.com wrote:
> > Hello all,
> >
> > I need a piece of advice/experience..
> >
> > What pdf parser (written in java) u'd recommend?
> >
> > I played now with PDFBox-0.6.7a and would not say I was satisfied too
> much
> > with it
> >
> > On certain pdf's (not well formated but anyway readable with acrobate)
> it
> > run into dead loop (this I could fix in code),
> > and on one file it produced "out of memory error" and killed jvm:( (this
>
> > problem I could not identify yet)
> >
> > After all the performance was not too great as well: it took c. 19 h. to
>
> > index 13000 files (c. 3.5Gb)
> >
> > Regards,
> > J.
> >
> >
> >
>
> On the specific problem of the "dead loop", I reported an instance of
> this to Ben a week or so ago and he has fixed it in the latest
> nightlies.  I expect an official release will include this bugfix soon.
> The file in question was unreadable with any PDF software I have, but
> someone managed to create it somehow...
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1037145&group_id=78314&atid=552832
>
> I've found pdfbox to be pretty good. The only time I get problems is
> with corrupted or egregiously bad PDF files.
>
> B.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message