lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Anderson <Eric.Ander...@LanRx.com>
Subject Re: [ANN] PDFBox 0.6.0
Date Thu, 06 Mar 2003 08:52:56 GMT
Ben-
In attempting to use the PDFBox-0.6.0, I rec'd the following error when 
attempting to scan a reasonably sized PDF repository.

Any thoughts?


 caught a class java.io.EOFException
 with message: Unexpected end of ZLIB input stream


Eric Anderson
LanRx Network Solutions


Quoting Ben Litchfield <ben@csh.rit.edu>:

> I would like to announce the next release of PDFBox.  PDFBox allows for
> PDF documents to be indexed using lucene through a simple interface.
> Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
> which will extract all text and PDF document summary properties as lucene
> fields.
> 
> You can obtain the latest release from http://www.pdfbox.org
> 
> Please send all bug reports to me and attach the PDF document when
> possible.
> 
> RELEASE 0.6.0
> -Massive improvements to memory footprint.
> -Must call close() on the COSDocument(LucenePDFDocument does this for you)
> -Really fixed the bug where small documents were not being indexed.
> -Fixed bug where no whitespace existed between obj and start of object.
>     Exception in thread "main" java.io.IOException: expected='obj'
>     actual='obj<</Pro
> -Fixed issue with spacing where textLineMatrix was not being copied
>  properly
> -Fixed 'bug' where parsing would fail with some pdfs with double endobj
>  definitions
> -Added PDF document summary fields to the lucene document
> 
> 
> Thank you,
> Ben Litchfield
> http://www.pdfbox.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message