lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@connected.com>
Subject Re: Need advice: what pdf lib to use?
Date Sun, 24 Oct 2004 15:44:33 GMT
iouli.golovatyi@group.novartis.com wrote:
> Hello all,
> 
> I need a piece of advice/experience..
> 
> What pdf parser (written in java) u'd recommend?
> 
> I played now with PDFBox-0.6.7a and would not say I was satisfied too much 
> with it
> 
> On certain pdf's (not well formated but anyway readable with acrobate)  it 
> run into dead loop (this I could fix in code),
> and on one file it produced "out of memory error" and killed jvm:( (this 
> problem I could not identify yet)
> 
> After all the performance was not too great as well: it took c. 19 h. to 
> index 13000 files (c. 3.5Gb)
> 
> Regards,
> J.
> 
> 
> 

On the specific problem of the "dead loop", I reported an instance of 
this to Ben a week or so ago and he has fixed it in the latest 
nightlies.  I expect an official release will include this bugfix soon. 
The file in question was unreadable with any PDF software I have, but 
someone managed to create it somehow...

http://sourceforge.net/tracker/index.php?func=detail&aid=1037145&group_id=78314&atid=552832

I've found pdfbox to be pretty good. The only time I get problems is 
with corrupted or egregiously bad PDF files.

B.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message