lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Borkenhagen, Michael (ofd-ko zdfin)" <>
Subject AW: PDF parser
Date Fri, 22 Nov 2002 14:41:52 GMT
There are different Parsers available - every Parser has other advantages
and disadvantages.
I use a combination of the PDFBox and Etymon PJ, cause their APIs are very simple. Both of them
parse PDF in a format of their own an provide interfaces to get the PDF
Documents contents.

Other developers on this list prefer JPedal which
parses PDF into XML an provide a XML Tree with the PDF Documents contents
View raw message