lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur Goel" <ank...@brickred.com>
Subject Problem while Indexing Pdf files
Date Thu, 25 Mar 2004 17:39:00 GMT

Hi, 

I have to index PDF files. For that I am using pdfbox. But when I try to
extract text from pdf file using pdfbox I get the following error:

java.io.IOException: Error: No 'ToUnicode' and no 'Encoding' for Font

	at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:347)

	at
org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:169)

	at
org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461)

	at
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:692)

	at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128)

	at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268)

	at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200)

	at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172)

	at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:120)

	at org.pdfbox.ExtractText.main(ExtractText.java:213)

	at test.LuceneExampleIndexer.indexFile(LuceneExampleIndexer.java:67)

	at
test.LuceneExampleIndexer.indexDirectory(LuceneExampleIndexer.java:47)

	at test.LuceneExampleIndexer.index(LuceneExampleIndexer.java:30)

	at test.LuceneExampleIndexer.main(LuceneExampleIndexer.java:118)


Please tell me how to go about it.

Thanks,
Ankur 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message