pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "YE ..." <stephe...@hotmail.com>
Subject Re: PDFbox unable to render Chinese font correctly when converting pdf to images
Date Sat, 19 Aug 2017 14:07:37 GMT
Hi Tilman,
I manually installed the two missing fonts on centos, AdobeKaitiStd, and STSong then the conversion
worked. Thanks for your help.

In retrospect, it seems on centos PDFbox couldn't correctly use fallback Chinese font for
adobekaitistd.

发自我的 iPhone

在 2017年8月19日,下午8:28,YE ... <stephenye@hotmail.com<mailto:stephenye@hotmail.com>>
写道:

Hi Tilman,
I am running the conversion on Centos, which doesn't have the two fonts installed. I have
installed google's cjk fonts and in most cases PDFbox shall automatically choose the right
ones for rendering Chinese characters. I will find a way to install ArialUnicodeMS and MicrosoftYaHei
 on centos to see if it works.
Many thanks,
Fangqiao

发自我的 iPhone

在 2017年8月19日,下午5:50,Tilman Hausherr <THausherr@t-online.de<mailto:THausherr@t-online.de>>
写道:

Hello Fangqiao,

I am able to render that file with PDFBox 2.0.7. You can see it at
http://imgur.com/a/UOfRl

In the log I get this:
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeKaitiStd-Regular
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeKaitiStd-Regular
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeKaitiStd-Regular
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeKaitiStd-Regular
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeKaitiStd-Regular
Warning  [PDCIDFontType0] Using fallback ArialUnicodeMS for CID-keyed font AdobeSongStdAdobeKaitiStd-Light
Warning  [PDCIDFontType0] Using fallback MicrosoftYaHei for CID-keyed font STSong-Light

Your invoice does not have its fonts embedded. The messages indicate that PDFBox has chosen
to use the fonts ArialUnicodeMS and MicrosoftYaHei  to display.

Either you don't have these fonts installed, or maybe you used an older PDFBox version?

Tilman


Am 19.08.2017 um 09:19 schrieb YE ...:
Hi Tilman,


Thanks for the quick reply. I will check for commercial solutions with font hinting you mentioned
here.


I have also included the links to the attachments mentioned in my previous email in case you
want to take a closer look.


PDF file:


https://shujubiji.cn/uppv/bjPhoto/d5418a966daecda62bcf056ddc1e79c99a4c6546/1503126757317/chinese_invoice.pdf?zid=__itemtoken__fe4bd85ea612752d11b4fdb02ff43c8871f9bc1c


You need to download it then use a PDF reader to open it so Chinese characters can be shown
correctly.


Converted image:

https://shujubiji.cn/uppv/bjPhoto/d5418a966daecda62bcf056ddc1e79c99a4c6546/1503126770469/chinese_invoice_1.jpg?zid=__itemtoken__6de80b84edb217bb37a19218bcec1eb78a24bcfc


Screenshot of the originally PDF displayed in PDF reader correctly:

https://shujubiji.cn/uppv/bjPhoto/d5418a966daecda62bcf056ddc1e79c99a4c6546/1503126791380/screenshot_from_2017_08_17_17_12_03.png?zid=__itemtoken__a7c4267ccd9597273a7d0286cb3353b1541a9ee1


Best regards,

Fangqiao

________________________________
From: Tilman Hausherr <THausherr@t-online.de<mailto:THausherr@t-online.de>>
Sent: Friday, August 18, 2017 4:18 PM
To: users@pdfbox.apache.org<mailto:users@pdfbox.apache.org>
Subject: Re: PDFbox unable to render Chinese font correctly when converting pdf to images

Hello Fangqiao,

Your files didn't get through, you must upload them to a sharehoster.
But I suspect that this is a known problem with chinese fonts, the cause
is explained here:
https://issues.apache.org/jira/browse/PDFBOX-3293
[PDFBOX-3293] Chinese font glyphs with overlapping paths ...<https://issues.apache.org/jira/browse/PDFBOX-3293>
issues.apache.org<http://issues.apache.org>
Font glyphs with overlapping paths may be rendered in correctly, especially when the font
size is small. Sadly, the Traditional Chinese edition of Windows bundled ...




How to fix it - by implementing font hinting. Which we haven't done.
There is no workaround, sadly. (Except of course use better fonts when
creating the PDF).

There are some commercial java products (google for them). At least two
of them have implemented font hinting (the others I don't know).

Sorry for not having better news.

Tilman


Am 18.08.2017 um 11:56 schrieb YE ...:
Hi,

I am from China and using PDFBox to convert pdf files to images. It
worked excellently in most cases. Thanks a lot for the team's great work.


However recently I used it to convert some invoices in PDF to images
and then some Chinese characters weren't converted correctly. Attached
is a sample PDF file, converted image and a screen shot of the
original PDF opened in PDF reader, which displayed all Chinese correctly.


I am seeking help from the community:


- what's the possible cause for the problem?


I guess that in the original pdf file some Chinese characters' font
wasn't set correctly.


- how to fix it?


If the above guess is correct, is there a way to detect correct font
type and set the correct font for conversion?


- or is there other solution that can fix the problem?


Many thanks,

Fangqiao





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org<mailto:users-unsubscribe@pdfbox.apache.org>
For additional commands, e-mail: users-help@pdfbox.apache.org<mailto:users-help@pdfbox.apache.org>




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org<mailto:users-unsubscribe@pdfbox.apache.org>
For additional commands, e-mail: users-help@pdfbox.apache.org<mailto:users-help@pdfbox.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message