lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: Need advice: what pdf lib to use?
Date Mon, 25 Oct 2004 15:59:03 GMT

In order to write software that consumes PDF documents you must agree to a
list of conditions.  One of those conditions is that permissions specified
by the author of the PDF document are respected.

PDFBox complies with this statement, if there is software that does not
then they are in violation of copyright law.

That being said, PDFBox is open source so a user could make modifications
to the source code, or as a PDF library could change permissions on a
document.

Ben

On Mon, 25 Oct 2004 iouli.golovatyi@group.novartis.com wrote:

> Yes Ben, You are right.
>
> This would be correct functionality from technical perspective. But look
> it my way with application programmer eyes reporting to big boss that c.
> 30% of doc we cope with could not be indexed because of this stupid
> limitation. Neither he or me have any influence on pdf owners and any
> ideas about what made  them create files with documet security set.
>
> In short, if You also could implement this "uncorrect functionality"  the
> "closed source" guys did, it would be really great!
>
> As far as sponsoring is concerned I would be ready to hack (or at least to
> try) it even for 1/3 of that fortune:)))
>
> J.
>
>
>
>
>
> Ben Litchfield <ben@csh.rit.edu>
> 25.10.2004 14:02
> Please respond to "Lucene Users List"
>
>
>         To:     Lucene Users List <lucene-user@jakarta.apache.org>
>         cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
>         Subject:        Re: Need advice: what pdf lib to use?
>         Category:
>
>
>
>
> PDFBox does not 'stumble' when it gives that message, that is correct
> functionality if that permission is not allowed.
>
> If your company is willing to pay a 'fortune' why not sponsor a change to
> an open source project for half a fortune.
>
> Ben
> http://www.pdfbox.org
>
> On Mon, 25 Oct 2004 iouli.golovatyi@group.novartis.com wrote:
>
> > PDFbox stumbles also with "class java.io.IOException with message:  -
> You
> > do not have permission to extract text" in case the doc is copy/print
> > protected.
> > I tested now the snowtide commercial product and it looks like it could
> > process these files as well. Performance was also not so bad.
> Unfortunatly
> > the test result could not be considered as 100%, because the free
> version
> > processed just first  8  pages.  After all this product costs a fortune
> > (as long the company is ready to pay I don't realy mind:))
> >
> > J.
> >
> >
> >
> >
> >
> > Robert Newson <rnewson@connected.com>
> > Sent by: news <news@sea.gmane.org>
> > 24.10.2004 17:44
> > Please respond to "Lucene Users List"
> >
> >
> >         To:     lucene-user@jakarta.apache.org
> >         cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
> >         Subject:        Re: Need advice: what pdf lib to use?
> >         Category:
> >
> >
> >
> > iouli.golovatyi@group.novartis.com wrote:
> > > Hello all,
> > >
> > > I need a piece of advice/experience..
> > >
> > > What pdf parser (written in java) u'd recommend?
> > >
> > > I played now with PDFBox-0.6.7a and would not say I was satisfied too
> > much
> > > with it
> > >
> > > On certain pdf's (not well formated but anyway readable with acrobate)
> > it
> > > run into dead loop (this I could fix in code),
> > > and on one file it produced "out of memory error" and killed jvm:(
> (this
> >
> > > problem I could not identify yet)
> > >
> > > After all the performance was not too great as well: it took c. 19 h.
> to
> >
> > > index 13000 files (c. 3.5Gb)
> > >
> > > Regards,
> > > J.
> > >
> > >
> > >
> >
> > On the specific problem of the "dead loop", I reported an instance of
> > this to Ben a week or so ago and he has fixed it in the latest
> > nightlies.  I expect an official release will include this bugfix soon.
> > The file in question was unreadable with any PDF software I have, but
> > someone managed to create it somehow...
> >
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1037145&group_id=78314&atid=552832
> >
> > I've found pdfbox to be pretty good. The only time I get problems is
> > with corrupted or egregiously bad PDF files.
> >
> > B.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message