lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject [ANN] PDFBox 0.6.0
Date Wed, 05 Mar 2003 23:51:24 GMT
I would like to announce the next release of PDFBox.  PDFBox allows for
PDF documents to be indexed using lucene through a simple interface.
Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
which will extract all text and PDF document summary properties as lucene
fields.

You can obtain the latest release from http://www.pdfbox.org

Please send all bug reports to me and attach the PDF document when
possible.

RELEASE 0.6.0
-Massive improvements to memory footprint.
-Must call close() on the COSDocument(LucenePDFDocument does this for you)
-Really fixed the bug where small documents were not being indexed.
-Fixed bug where no whitespace existed between obj and start of object.
    Exception in thread "main" java.io.IOException: expected='obj'
    actual='obj<</Pro
-Fixed issue with spacing where textLineMatrix was not being copied
 properly
-Fixed 'bug' where parsing would fail with some pdfs with double endobj
 definitions
-Added PDF document summary fields to the lucene document


Thank you,
Ben Litchfield
http://www.pdfbox.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message