pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kulbhushan singh <kulbhushan.t...@gmail.com>
Subject Re: Fwd: Junk Characters while Extracting text from pdf file.
Date Wed, 06 Feb 2013 10:24:36 GMT
Hi Andreas,

I did the adobe test and it gives me the same junk characters as pdfbox. I
also tried to "save as text.." but result is same.  In pdf properties I
found that encoding is Identity-H. I googled this encoding and fond that
many others also have the same problem.

In my pdf I am even not able to search any text. Is OCR and Glyph my only
option to extract text out of it? Or is there and other way to go on this.

Regards, Kulbhushan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message