lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Borkenhagen, Michael (ofd-ko zdfin)" <Michael.Borkenha...@ofd-ko.fin-rlp.de>
Subject AW: PDF parser
Date Fri, 22 Nov 2002 14:41:52 GMT
There are different Parsers available - every Parser has other advantages
and disadvantages.
I use a combination of the PDFBox  http://www.pdfbox.org/ and Etymon PJ
http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them
parse PDF in a format of their own an provide interfaces to get the PDF
Documents contents.

Other developers on this list prefer JPedal http://www.jpedal.org/ which
parses PDF into XML an provide a XML Tree with the PDF Documents contents
Mime
View raw message