lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: [ANN] PDFBox 0.6.0
Date Thu, 06 Mar 2003 13:44:42 GMT
In this release I have changed how I parsed the document, which may have
introduced this bug.  I have received another report of this and will have
it fixed for the next point release.

You said you tried with reasonably sized PDF repository.  Did you stop
indexing at this error or did you continue?  If you continued, is this the
only error that you got?

-Ben




-- 

On Thu, 6 Mar 2003, Eric Anderson wrote:

> Ben-
> In attempting to use the PDFBox-0.6.0, I rec'd the following error when
> attempting to scan a reasonably sized PDF repository.
>
> Any thoughts?
>
>
>  caught a class java.io.EOFException
>  with message: Unexpected end of ZLIB input stream
>
>
> Eric Anderson
> LanRx Network Solutions
>
>
> Quoting Ben Litchfield <ben@csh.rit.edu>:
>
> > I would like to announce the next release of PDFBox.  PDFBox allows for
> > PDF documents to be indexed using lucene through a simple interface.
> > Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
> > which will extract all text and PDF document summary properties as lucene
> > fields.
> >
> > You can obtain the latest release from http://www.pdfbox.org
> >
> > Please send all bug reports to me and attach the PDF document when
> > possible.
> >
> > RELEASE 0.6.0
> > -Massive improvements to memory footprint.
> > -Must call close() on the COSDocument(LucenePDFDocument does this for you)
> > -Really fixed the bug where small documents were not being indexed.
> > -Fixed bug where no whitespace existed between obj and start of object.
> >     Exception in thread "main" java.io.IOException: expected='obj'
> >     actual='obj<</Pro
> > -Fixed issue with spacing where textLineMatrix was not being copied
> >  properly
> > -Fixed 'bug' where parsing would fail with some pdfs with double endobj
> >  definitions
> > -Added PDF document summary fields to the lucene document
> >
> >
> > Thank you,
> > Ben Litchfield
> > http://www.pdfbox.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
> LanRx Network Solutions, Inc.
> Providing Enterprise Level Solutions...On A Small Business Budget
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message