lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <>
Subject Re: Need advice: what pdf lib to use?
Date Fri, 22 Oct 2004 12:40:24 GMT

Please post any PDFBox issues you notice on the PDFBox sourceforge bug
list, if possible attach/email any problem PDFs that you encounter.

There are some efforts underway to improve the speed of PDFBox, you can
monitor the progress at

As for other suggestions, I know some people have utilized xpdf(open
source but non Java) to extract the text.

For other Java solutions
PDFTextStream(commercial) - "Fastest PDF-to-Text Solution for Java"

Etymon PJ (GPL)


On Fri, 22 Oct 2004 wrote:

> Hello all,
> I need a piece of advice/experience..
> What pdf parser (written in java) u'd recommend?
> I played now with PDFBox-0.6.7a and would not say I was satisfied too much
> with it
> On certain pdf's (not well formated but anyway readable with acrobate)  it
> run into dead loop (this I could fix in code),
> and on one file it produced "out of memory error" and killed jvm:( (this
> problem I could not identify yet)
> After all the performance was not too great as well: it took c. 19 h. to
> index 13000 files (c. 3.5Gb)
> Regards,
> J.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message