lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: [ANN] PDFBox 0.6.0
Date Mon, 10 Mar 2003 02:38:51 GMT

I believe this problem has been fixed with 0.6.1.  Please give it a try.

Ben Litchfield

-- 

On Thu, 6 Mar 2003, Eric Anderson wrote:

> When it throws the exception, the indexer fails, so I cannot continue the index.
>
> It appears that it's only related to some files, as I have been able to remove
> some of the files, and it will continue past that point, but if it encounters
> one of these files, the index fails.
>
> Eric Anderson
> LanRx Network Solutions
> 815-505-6132
>
>
> Quoting Ben Litchfield <ben@csh.rit.edu>:
>
> > In this release I have changed how I parsed the document, which may have
> > introduced this bug.  I have received another report of this and will have
> > it fixed for the next point release.
> >
> > You said you tried with reasonably sized PDF repository.  Did you stop
> > indexing at this error or did you continue?  If you continued, is this the
> > only error that you got?
> >
> > -Ben
> >
> >
> >
> >
> > --
> >
> > On Thu, 6 Mar 2003, Eric Anderson wrote:
> >
> > > Ben-
> > > In attempting to use the PDFBox-0.6.0, I rec'd the following error when
> > > attempting to scan a reasonably sized PDF repository.
> > >
> > > Any thoughts?
> > >
> > >
> > >  caught a class java.io.EOFException
> > >  with message: Unexpected end of ZLIB input stream
> > >
> > >
> > > Eric Anderson
> > > LanRx Network Solutions
> > >
> > >
> > > Quoting Ben Litchfield <ben@csh.rit.edu>:
> > >
> > > > I would like to announce the next release of PDFBox.  PDFBox allows for
> > > > PDF documents to be indexed using lucene through a simple interface.
> > > > Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
> > > > which will extract all text and PDF document summary properties as
> > lucene
> > > > fields.
> > > >
> > > > You can obtain the latest release from http://www.pdfbox.org
> > > >
> > > > Please send all bug reports to me and attach the PDF document when
> > > > possible.
> > > >
> > > > RELEASE 0.6.0
> > > > -Massive improvements to memory footprint.
> > > > -Must call close() on the COSDocument(LucenePDFDocument does this for
> > you)
> > > > -Really fixed the bug where small documents were not being indexed.
> > > > -Fixed bug where no whitespace existed between obj and start of object.
> > > >     Exception in thread "main" java.io.IOException: expected='obj'
> > > >     actual='obj<</Pro
> > > > -Fixed issue with spacing where textLineMatrix was not being copied
> > > >  properly
> > > > -Fixed 'bug' where parsing would fail with some pdfs with double endobj
> > > >  definitions
> > > > -Added PDF document summary fields to the lucene document
> > > >
> > > >
> > > > Thank you,
> > > > Ben Litchfield
> > > > http://www.pdfbox.org
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > >
> > > LanRx Network Solutions, Inc.
> > > Providing Enterprise Level Solutions...On A Small Business Budget
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
> LanRx Network Solutions, Inc.
> Providing Enterprise Level Solutions...On A Small Business Budget
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message