lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Need advice: what pdf lib to use?
Date Mon, 25 Oct 2004 08:01:57 GMT
many thanks for your complrehensive answer. Unfourtunatly I can not send 
the problem pdfs cause they are the property of company and are of top 


Ben Litchfield <>
22.10.2004 14:40
Please respond to "Lucene Users List"

        To:     Lucene Users List <>
        cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
        Subject:        Re: Need advice: what pdf lib to use?

Please post any PDFBox issues you notice on the PDFBox sourceforge bug
list, if possible attach/email any problem PDFs that you encounter.

There are some efforts underway to improve the speed of PDFBox, you can
monitor the progress at

As for other suggestions, I know some people have utilized xpdf(open
source but non Java) to extract the text.

For other Java solutions
PDFTextStream(commercial) - "Fastest PDF-to-Text Solution for Java"

Etymon PJ (GPL)


On Fri, 22 Oct 2004 wrote:

> Hello all,
> I need a piece of advice/experience..
> What pdf parser (written in java) u'd recommend?
> I played now with PDFBox-0.6.7a and would not say I was satisfied too 
> with it
> On certain pdf's (not well formated but anyway readable with acrobate) 
> run into dead loop (this I could fix in code),
> and on one file it produced "out of memory error" and killed jvm:( (this
> problem I could not identify yet)
> After all the performance was not too great as well: it took c. 19 h. to
> index 13000 files (c. 3.5Gb)
> Regards,
> J.

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message