pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Choosing a font for non-ASCII characters
Date Sat, 02 Mar 2019 14:54:21 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Andreas,

On 1/31/19 01:27, Andreas Lehmkuehler wrote:
> the standard pdf font (PDType1Font.HELVETICA et. al.) don't
> support anything else than (limited) latin1. You have to use
> something else.
> 
> Have a look at the HelloWorldTTF example [1]. It shows how to embed
> a true type font. You have to choose a suitable font from your OS
> or something like the noto fonts from google.
> 
> W.r.t. font embedding. It's always a good idea to embed all
> resources which are needed to render a pdf. PDFBox reduces the
> amount of space as it limits the embedded font to the used
> characters.

Thanks for the pointers. I'm finally getting around to doing something
about this. I used "Arial Unicode" as referenced in a quickie online
tutorial[1] and what I'm finding is that:

1. The Chinese characters render correctly (yay!)
2. My English-only file has gone from ~1k to ~18k

This test file was the simplest I could muster so it's really an
unfair comparison at this point.

But it's clear that the file will get bigger (of course) by adding the
font.

I'd like to avoid bundling the font unless it's necessary. For several
months, we've been able to get away with the standard PDF default
fonts (which, presumably, the PDF spec requires all clients to provide
which is why the files can be so small). Is there a good way to probe
text to determine whether or not an alternate font will be necessary
and only load/bundle it then?

Thanks,
- -chris

[1] http://www.kscodes.com/java/write-chinese-pdf-using-apache-pdfbox/

> Am 31.01.19 um 02:56 schrieb Christopher Schultz: Hello,
> 
> We are using PDFBox to generate PDFs in a very simple way and only 
> including fonts available from the PDType1Font class (e.g. 
> PDType1Font.HELVETICA). The PDFs we are generating are really only 
> including a few title/subtitles, text, and bulleted/numbered
> lists.
> 
> Everything is fine when we use what is probably in the standard
> Latin alphabet, and we've had some troubles with special characters
> that don't fit in there such as ≥ and ≤. We've dealt with that by
> simply replacing "≤" with "<=" and so on, but we're starting to use
> languages that don't use Latin script and so we can no longer
> replace out way out of the problem.
> 
> For example, I need to be able to put Chinese characters into a PDF
> we generate. So let's take the text "中國" which is just the word
> "China" in Traditional Chinese script.
> 
> First, how can I find out that the character isn't going to fit
> into the font that I'm currently using? Should I do it for every
> character we try to put into the page, or should we just catch
> exceptions when we try to write the text to the page and then scan
> at that point? I'm trying to avoid writing hideously inefficient
> code to handle these situations.
> 
> Second, once I know that I need to choose another font... how do I 
> know which font to choose? Should I keep a mapping of Unicode code 
> point ranges and the best fonts to use for them?
> 
> Finally, what fonts are actually available to PDFBox? How do I add
> new ones? I have a lot of control over the environment and I get to
> see failing conversions and intervene, so some trial and error is
> okay for each new situation.
> 
> The recipients of our PDFs are file-size sensitive, so I'd only
> want to include (bundle) a font in a PDF if it was absolutely
> necessary to include the font itself. If we can get away with
> including a *reference* to the font in the PDF and telling these
> recipients "sorry, if you want to read the Chinese PDFs we send,
> you'd better make sure you have font X installed" then that's okay
> with me, too.
> 
> What suggestions to people have for doing all of the above?
> 
> Thanks, -chris
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx6mRwACgkQHPApP6U8
pFjWYQ/+JqfHbkkJ4ML+uxduY4PIJqY7u+FC1lsbVvbVjIhi1rLCQRuNDUWnpkmz
bSfwCoDOevamegryFFxH/I4Ok+v8TXmBUEnAeEOFtHGlWHDuNXcijxmlFRKdpjIi
MFzqv8t+4+YY6dS4KyHr4+fhj57sSqRkGVrKAYANonx3z/nEn/X7PqOnY1seDrEJ
QGB/09y36+58E6TI+65resE181nvYFcw5kqchFWIjziwH654gldLQCojZ15GS5+/
PylDx5f6n/pxPYJLX940zEDjfqR4FCQryuzo1Yf3xM96c1IMYJbViv/LWrz+lQnc
+7PPK99oVhRdQKQ90HOsFA+7WfyB6IXv/uOdFyXSjWTP7NNQ4v5wSp+nULrRsSRH
uc3FL9N55ujdHb5uTQW5tl5kENfIXdgh5X0XtI/3TQGnmFJRbsx/py/Elpno7HVO
IwbwWTXnefYGvjsP1zU1YjCS4WBuekE/3C5Mn5zJaQFxRNrNCXmAeYBLskA6gitk
u5A+wl3jPlGrJe5Vvvgr6CJJl9p67XldiJslUQ/Gekjqd0VA572zeiOhj35Qkh1D
Eh43WPn2KR2TGYtmU1WyM4fyKIN7/9ReqTv53hV8t5P/ItEjlY9zAABnMDsK+eXr
iRK/Q8LbMpLz3osZQuccmCaSfTufbGr444lngSZLRhFs7Uoihl4=
=p8mE
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message