pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Choosing a font for non-ASCII characters
Date Thu, 31 Jan 2019 01:56:52 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hello,

We are using PDFBox to generate PDFs in a very simple way and only
including fonts available from the PDType1Font class (e.g.
PDType1Font.HELVETICA). The PDFs we are generating are really only
including a few title/subtitles, text, and bulleted/numbered lists.

Everything is fine when we use what is probably in the standard Latin
alphabet, and we've had some troubles with special characters that
don't fit in there such as ≥ and ≤. We've dealt with that by simply
replacing "≤" with "<=" and so on, but we're starting to use languages
that don't use Latin script and so we can no longer replace out way
out of the problem.

For example, I need to be able to put Chinese characters into a PDF we
generate. So let's take the text "中國" which is just the word "China"
in Traditional Chinese script.

First, how can I find out that the character isn't going to fit into
the font that I'm currently using? Should I do it for every character
we try to put into the page, or should we just catch exceptions when
we try to write the text to the page and then scan at that point? I'm
trying to avoid writing hideously inefficient code to handle these
situations.

Second, once I know that I need to choose another font... how do I
know which font to choose? Should I keep a mapping of Unicode code
point ranges and the best fonts to use for them?

Finally, what fonts are actually available to PDFBox? How do I add new
ones? I have a lot of control over the environment and I get to see
failing conversions and intervene, so some trial and error is okay for
each new situation.

The recipients of our PDFs are file-size sensitive, so I'd only want
to include (bundle) a font in a PDF if it was absolutely necessary to
include the font itself. If we can get away with including a
*reference* to the font in the PDF and telling these recipients
"sorry, if you want to read the Chinese PDFs we send, you'd better
make sure you have font X installed" then that's okay with me, too.

What suggestions to people have for doing all of the above?

Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxSVeMACgkQHPApP6U8
pFgQew/8CS1YmJs27QrD+WGV/Zcn2RAeG/ZVs5w3huMwKLY8NfXQ4Vdp3o+s+B7u
2wn9m2LJVXuWT2dfDDQzZDIfBgfqZI5sl4+hBDSos9gEVV3ddWcox1A0YSTCy5VW
DAlDZSscEdIDyMIVz2E1dQi6/p35MrSyJ/Xom6Tbnvt3ZHAp87GHZ1rB8XXrtVZS
itVE756hJ59o4tZJoM9cH1NH1w9PuLLJyrGpCsc1oTgcZTI0jXxiIC9Q4GvLbLbO
yVdExITzTVflLAo0BRGOJkb5IF1OyVf51HHas1+DMEvtSXY5J89e1dFnyo1dFxMU
MXJ5rKh/FQvJtC5Lf9QoQ3tV8r3qyWv0wc8FVgMcLUA9DHbx7QtcydQwoKf3poJz
ymlOJWH2b4d5uLbSfdjr9Nof4IRNH504cwjoth3eor3Ra/SCaem2ZrTQhY6XzoF1
vCpZChDIKzDvI7NDGbcaNvzzezNmlbdRdh3Ekwk1E/vwfrmtb4VmW7sW9PICP1o6
80sqydy6qIMtQNjr1EK55VIvD4+e10SwYWhcZinsByQkYZpoRjKWQ9kTNk10vvwk
cLB8bVeLPHC7nLe4FqJe4y3+hWBfGP25O2VdnNU1sjd4lbzQhNIgCMj0n+6ziDuU
Nh9vDuKRXEIIXHZUxrN2Td3hOw96wKHqEQ8RtxYpuGWABx4wIWw=
=aMPi
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message