pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Fwd: Junk Characters while Extracting text from pdf file.
Date Tue, 05 Feb 2013 18:36:17 GMT

Am 05.02.2013 15:01, schrieb kulbhushan singh:
> Hi,
> I am trying to extract text from a pdf file with custom fonts but it is
> giving me junk characters. The fonts used are ArialMT (embedded subset) &
> Arial-BoldMT (embedded subset). The producer of pdf file is GPL Ghost
> script 8.15. I am using PDFTextStripper to extract the text. How can do it
> for custom fonts. Any reference or solution would be appreciated.
Did you do the "adobe" test? [1]

> Regards, Kulbhushan

Andreas Lehmkühler

[1] http://pdfbox.apache.org/userguide/faq.html#no_text_extraction

View raw message