pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Choosing a font for non-ASCII characters
Date Sat, 02 Mar 2019 14:54:21 GMT
Hash: SHA256


On 1/31/19 01:27, Andreas Lehmkuehler wrote:
> the standard pdf font (PDType1Font.HELVETICA et. al.) don't
> support anything else than (limited) latin1. You have to use
> something else.
> Have a look at the HelloWorldTTF example [1]. It shows how to embed
> a true type font. You have to choose a suitable font from your OS
> or something like the noto fonts from google.
> W.r.t. font embedding. It's always a good idea to embed all
> resources which are needed to render a pdf. PDFBox reduces the
> amount of space as it limits the embedded font to the used
> characters.

Thanks for the pointers. I'm finally getting around to doing something
about this. I used "Arial Unicode" as referenced in a quickie online
tutorial[1] and what I'm finding is that:

1. The Chinese characters render correctly (yay!)
2. My English-only file has gone from ~1k to ~18k

This test file was the simplest I could muster so it's really an
unfair comparison at this point.

But it's clear that the file will get bigger (of course) by adding the

I'd like to avoid bundling the font unless it's necessary. For several
months, we've been able to get away with the standard PDF default
fonts (which, presumably, the PDF spec requires all clients to provide
which is why the files can be so small). Is there a good way to probe
text to determine whether or not an alternate font will be necessary
and only load/bundle it then?

- -chris

[1] http://www.kscodes.com/java/write-chinese-pdf-using-apache-pdfbox/

> Am 31.01.19 um 02:56 schrieb Christopher Schultz: Hello,
> We are using PDFBox to generate PDFs in a very simple way and only 
> including fonts available from the PDType1Font class (e.g. 
> PDType1Font.HELVETICA). The PDFs we are generating are really only 
> including a few title/subtitles, text, and bulleted/numbered
> lists.
> Everything is fine when we use what is probably in the standard
> Latin alphabet, and we've had some troubles with special characters
> that don't fit in there such as ≥ and ≤. We've dealt with that by
> simply replacing "≤" with "<=" and so on, but we're starting to use
> languages that don't use Latin script and so we can no longer
> replace out way out of the problem.
> For example, I need to be able to put Chinese characters into a PDF
> we generate. So let's take the text "中國" which is just the word
> "China" in Traditional Chinese script.
> First, how can I find out that the character isn't going to fit
> into the font that I'm currently using? Should I do it for every
> character we try to put into the page, or should we just catch
> exceptions when we try to write the text to the page and then scan
> at that point? I'm trying to avoid writing hideously inefficient
> code to handle these situations.
> Second, once I know that I need to choose another font... how do I 
> know which font to choose? Should I keep a mapping of Unicode code 
> point ranges and the best fonts to use for them?
> Finally, what fonts are actually available to PDFBox? How do I add
> new ones? I have a lot of control over the environment and I get to
> see failing conversions and intervene, so some trial and error is
> okay for each new situation.
> The recipients of our PDFs are file-size sensitive, so I'd only
> want to include (bundle) a font in a PDF if it was absolutely
> necessary to include the font itself. If we can get away with
> including a *reference* to the font in the PDF and telling these
> recipients "sorry, if you want to read the Chinese PDFs we send,
> you'd better make sure you have font X installed" then that's okay
> with me, too.
> What suggestions to people have for doing all of the above?
> Thanks, -chris
>> ---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
> ---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message