lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: Problem while Indexing Pdf files
Date Thu, 25 Mar 2004 18:55:21 GMT

The latest release of PDFBox changed the way it dealt with fonts and
introduced this bug, please try the version in CVS and let me know if you
are still having a problem.

Ben


On Thu, 25 Mar 2004, Ankur Goel wrote:

>
> Hi,
>
> I have to index PDF files. For that I am using pdfbox. But when I try to
> extract text from pdf file using pdfbox I get the following error:
>
> java.io.IOException: Error: No 'ToUnicode' and no 'Encoding' for Font
>
> 	at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:347)
>
> 	at
> org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:169)
>
> 	at
> org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461)
>
> 	at
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:692)
>
> 	at
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128)
>
> 	at
> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268)
>
> 	at
> org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200)
>
> 	at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172)
>
> 	at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:120)
>
> 	at org.pdfbox.ExtractText.main(ExtractText.java:213)
>
> 	at test.LuceneExampleIndexer.indexFile(LuceneExampleIndexer.java:67)
>
> 	at
> test.LuceneExampleIndexer.indexDirectory(LuceneExampleIndexer.java:47)
>
> 	at test.LuceneExampleIndexer.index(LuceneExampleIndexer.java:30)
>
> 	at test.LuceneExampleIndexer.main(LuceneExampleIndexer.java:118)
>
>
> Please tell me how to go about it.
>
> Thanks,
> Ankur
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message