pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Japanese characters
Date Wed, 28 Aug 2013 10:13:16 GMT

> Zak Bennett <zakerey.bennett@gmail.com> hat am 28. August 2013 um 01:20
> geschrieben:
> Hi guys,
> Firstly I apologise if this question has been repeated often. Having looked
> around I have found a number of individuals with the same issue as myself.
> Have you discovered any workarounds to the issue of returning Japanese text
> information from a PDF using pdfbox? If not, would this be an issue which
> the dev team is currently working to solve?
Please be more specific. There are 3 known cases:

- PDFBox can extract the text of pdfs containing foreign (non latin)
languages depending on the used font
- the text extraction doesn't work because of the used font and a
Implementation in PDFBox
- the text can't be extracted, even the adobe test fails see [1]

So, the question is, did you ever try to extract text? If not, give it a try [2]

> Best regards,
> Zak

Andreas Lehmkühler

[1] http://pdfbox.apache.org/userguide/faq.html#notext
[2] http://pdfbox.apache.org/commandline/

View raw message