pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richter, Michael" <m.rich...@tu-berlin.de>
Subject Re: Choosing a font for non-ASCII characters
Date Fri, 01 Feb 2019 09:54:14 GMT
Hi,
A few weeks ago I had issues with unicode too. I switched the font to LiberationSans which
is included in PDFBox:


PDFont font = PDType0Font.load(document,
    PDDocument.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"),
true);


This works for me.


And I stumbled over this which may help you:

https://stackoverflow.com/questions/51481600/handle-many-unicode-caracters-with-pdfbox


--

Michael Richter


Am Mittwoch, den 30.01.2019, 20:56 -0500 schrieb Christopher Schultz:

Hello,


We are using PDFBox to generate PDFs in a very simple way and only

including fonts available from the PDType1Font class (e.g.

PDType1Font.HELVETICA). The PDFs we are generating are really only

including a few title/subtitles, text, and bulleted/numbered lists.


Everything is fine when we use what is probably in the standard Latin

alphabet, and we've had some troubles with special characters that

don't fit in there such as ≥ and ≤. We've dealt with that by simply

replacing "≤" with "<=" and so on, but we're starting to use languages

that don't use Latin script and so we can no longer replace out way

out of the problem.


For example, I need to be able to put Chinese characters into a PDF we

generate. So let's take the text "中國" which is just the word "China"

in Traditional Chinese script.


First, how can I find out that the character isn't going to fit into

the font that I'm currently using? Should I do it for every character

we try to put into the page, or should we just catch exceptions when

we try to write the text to the page and then scan at that point? I'm

trying to avoid writing hideously inefficient code to handle these

situations.


Second, once I know that I need to choose another font... how do I

know which font to choose? Should I keep a mapping of Unicode code

point ranges and the best fonts to use for them?


Finally, what fonts are actually available to PDFBox? How do I add new

ones? I have a lot of control over the environment and I get to see

failing conversions and intervene, so some trial and error is okay for

each new situation.


The recipients of our PDFs are file-size sensitive, so I'd only want

to include (bundle) a font in a PDF if it was absolutely necessary to

include the font itself. If we can get away with including a

*reference* to the font in the PDF and telling these recipients

"sorry, if you want to read the Chinese PDFs we send, you'd better

make sure you have font X installed" then that's okay with me, too.


What suggestions to people have for doing all of the above?


Thanks,

-chris


---------------------------------------------------------------------

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org<mailto:users-unsubscribe@pdfbox.apache.org>

For additional commands, e-mail: users-help@pdfbox.apache.org<mailto:users-help@pdfbox.apache.org>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message