lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Borkenhagen, Michael (ofd-ko zdfin)" <Michael.Borkenha...@ofd-ko.fin-rlp.de>
Subject AW: [ANN] PDFBox 0.6.0
Date Thu, 06 Mar 2003 15:40:31 GMT
Ben,

by using PDFBox-0.5.6 and alternative PDFBox-0.6.0 I'd receive the following
StackTrace
java.lang.ClassCastException: org.pdfbox.cos.COSObject
        at
org.pdfbox.encoding.DictionaryEncoding.<init>(DictionaryEncoding.java
:66)
        at org.pdfbox.cos.COSObject.getEncoding(COSObject.java:269)
        at org.pdfbox.cos.COSObject.encode(COSObject.java:210)
        at
org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:959)
        at
org.pdfbox.util.PDFTextStripper.handleOperation(PDFTextStripper.java:
788)
        at org.pdfbox.util.PDFTextStripper.process(PDFTextStripper.java:379)
        at org.pdfbox.util.PDFTextStripper.process(PDFTextStripper.java:366)
        at
org.pdfbox.util.PDFTextStripper.processPageContents(PDFTextStripper.j
ava:288)
        at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:231
)
        at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:223
)
        at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:148)
        ...
(Stack from PDF 0.6.0)

I also receive the from Eric reported Error - but only one time. My Indexer
continues parsing the other pdf Documents after getting an error.
Have you any idea regarding the ClassCastException ?

Michael
-----Ursprüngliche Nachricht-----
Von: Ben Litchfield [mailto:ben@csh.rit.edu]
Gesendet: Donnerstag, 6. März 2003 14:45
An: Lucene Users List
Betreff: Re: [ANN] PDFBox 0.6.0


In this release I have changed how I parsed the document, which may have
introduced this bug.  I have received another report of this and will have
it fixed for the next point release.

You said you tried with reasonably sized PDF repository.  Did you stop
indexing at this error or did you continue?  If you continued, is this the
only error that you got?

-Ben




-- 

On Thu, 6 Mar 2003, Eric Anderson wrote:

> Ben-
> In attempting to use the PDFBox-0.6.0, I rec'd the following error when
> attempting to scan a reasonably sized PDF repository.
>
> Any thoughts?
>
>
>  caught a class java.io.EOFException
>  with message: Unexpected end of ZLIB input stream
>
>
> Eric Anderson
> LanRx Network Solutions
>
>
> Quoting Ben Litchfield <ben@csh.rit.edu>:
>
> > I would like to announce the next release of PDFBox.  PDFBox allows for
> > PDF documents to be indexed using lucene through a simple interface.
> > Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
> > which will extract all text and PDF document summary properties as
lucene
> > fields.
> >
> > You can obtain the latest release from http://www.pdfbox.org
> >
> > Please send all bug reports to me and attach the PDF document when
> > possible.
> >
> > RELEASE 0.6.0
> > -Massive improvements to memory footprint.
> > -Must call close() on the COSDocument(LucenePDFDocument does this for
you)
> > -Really fixed the bug where small documents were not being indexed.
> > -Fixed bug where no whitespace existed between obj and start of object.
> >     Exception in thread "main" java.io.IOException: expected='obj'
> >     actual='obj<</Pro
> > -Fixed issue with spacing where textLineMatrix was not being copied
> >  properly
> > -Fixed 'bug' where parsing would fail with some pdfs with double endobj
> >  definitions
> > -Added PDF document summary fields to the lucene document
> >
> >
> > Thank you,
> > Ben Litchfield
> > http://www.pdfbox.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
> LanRx Network Solutions, Inc.
> Providing Enterprise Level Solutions...On A Small Business Budget
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message