pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Choosing a font for non-ASCII characters
Date Tue, 05 Mar 2019 06:22:20 GMT
Am 04.03.2019 um 20:44 schrieb Christopher Schultz:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Tilman,
>
> On 3/3/19 08:48, Tilman Hausherr wrote:
>>> I have no idea. The information about PDFBox seems to be mostly
>>> in example programs and not web-based documentation. Searching
>>> e.g. Google for "how to use FontBox with PDFBox" generally comes
>>> up with references into the Javadoc for "uses of FontBox
>>> interface".
>>>
>>> The Javadoc does not describe what FontBox is and none of the
>>> classes or subclasses in those related packages really have any
>>> documentation worth reading. Each class "foo" is described as
>>> "being a foo" and each "getBar" method is described as "gets the
>>> bar for the foo".
>>>
>>> So... discoverability of features is pretty much nil here.
>>>
>>> I'm quite happy with the responses I get on this mailing list,
>>> but it's nearly impossible to discover on my own what is
>>> possible, here. I shouldn't have to get you guys to tell me how
>>> to use the software... you have better things to do (like
>>> continue to write great software).
>>>
>>> Is there a good example of using FontBox with PDFBox in order to
>>> subset a font?
>> Yes, the EmbeddedFonts.java example.
> I don't see any use of FontBox in the EmbeddedFonts.java example. Am I
> missing something?


Sorry, I meant under the hood it is using fontbox. 
EmbeddedVerticalFonts.java uses fontbox directly but only to open the 
font and access some special features.

You don't need to bother about subsetting yourself. PDFBox does this for 
you. If you want to know how it is done, see TTFSubsetter.java and its 
usages, i.e. all implementations of Subsetter.java (TrueTypeEmbedder, 
PDTrueTypeFontEmbedder, PDCIDFontType2Embedder).

> :)
>
> It's less of a presence of useless documentation and more of a lack of
> existing documentation. I can file some tickets if you think it would
> be helpful. I also don't mind writing documentation and/or tutorials
> for the project.

Try to start with something small. I try to concentrate on javadoc 
improvements and having working examples. A tutorial, to be complete, 
would also need to introduce people to PDF concepts. An example has the 
advantage that it works immediately even if they don't know anything 
about the PDF specification and PDF operators and content streams and 
the PD and COS model - people just need to adjust the example it to 
their needs.


>
>> The subset thing is done by PDFBox without you having to bother
>> about it. It's "not subsetting" that would require more parameters.
>> So you need only this:
>>
>> PDType0Font font = PDType0Font.load(document, new
>> File("c:/windows/fonts/arial.ttf")); stream.setFont(font, 12);
>> stream.showText("...");
> Okay, that's exactly what we are doing (well... we are loading the
> font via the ClassLoader, but ...). And it's working. I was just a
> little worried about the ballooning file size. I realize there is
> little to be done about that at this stage.


It will grow if you open the same font several times for one PDF file. 
Of course it will also grow if you use many different glyphs. But 
seriously, protesting because a file grows from 1 KB to 18 KB? Your file 
is only 18 KB! that is what counts. That is still small. Most PDF 
invoices I get are > 100KB.


>
> At this point, I am basically doing this:
>
> [ When adding text to the document ]
> - - If the text contains anything outside of the ANSI encoding
>   - then replace the usual (default) font with the ARIALUNI.TTF
>
> It operates on a per-text-string basis, so it should only change the
> font for a single piece of text that requires it. I'm starting to
> think that I should not bother scanning the text and instead use the
> IllegalArgumentException as flow-control -- which I still don't like.
> But it means that my code will not spend a ton of time repeating
> checks that PDFBox will end up doing, anyway.
>
> I'm a little worried about what I will do the next time I have an
> issue like this -- where the ARIALUNI.TTF font doesn't include some
> character that I need... since there's no way to probe a font for
> support for a code point, I can't map code-points to fonts in a
> scalable way. It will just be trial-and-error which is no fun.


Know your clients. Have enough fonts for different languages. Use the 
multifont example I mentioned (EmbeddedMultipleFonts.java).

I assume there is a way to ask a font whether a codepoint is 
supported... You could use TrueTypeFont.hasGlyph(name). The name is the 
postscript name of that glyph. But I don't know if this works for 
characters not in the adobe glyphlist.


Tilman


>
> It also means that I need to have some kind of set of fonts that we
> just round-robin through, hoping we get a hit and we can continue...
> otherwise we just have to fail (like we do now).
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx9gDYACgkQHPApP6U8
> pFia0w/+LSFIJCLtol+WZDMpcjTxI1Y4ulUFmRJxd+ZdGzbCrKss2R3p+J6VGZ0w
> SZWAUQqg48FoVu4kh3fp4j9mz9eqprF9rmZiEPqGJKtsUPnpMTd3SA6Xt2eucY3O
> VMOEbsy66/wC3DwgIgQdrrDfuRWsvmLkE6WyvkJpf1+sDIgFkSoD57y3YpHQdB4/
> o6+WXg1FSVjQAiND/XYAGZUHmV2o5JGFJVJJNlnmC6m11j/0zZvv4ZS1v3NX4DS1
> n9cwHtTEUxcz73AGzUo9A0QLfsPgEMEF8akbaLfA4UekZ0lZLCFXA36aP62KaI6b
> ICo1/qF7eEOC1XpdCZS2JWpjMQn83q2kvuIooTEyHXjOT8t27f0+455e3PgYuLkh
> kV9xMutmkJxXKv5VO3ohTmDWydQiwt/90M9ToTKonGeYWXTEEWzHpHr6BD95/2rZ
> +yAbY3S0vTb1J0uQmlDaK6dd1pU+SSMxIV6Gi1tYi1kMVboiiQAMxJ9eqEhjt21+
> W3x4oGPLUoJ6q1TSTh0BOnXVnEUeci/Srbp+GWXvhmXtVC5H9V6dggb94yaKI3nC
> KLW+87OYaU+Pd4GQNMI+2KipGAbeQ/8OhHEq63cFoKLzhKk/V/50w3Bo9/CLGyZ3
> W0E7lAZWV5cnu/AoKHC9KdSIPf+Qn6c//CtDmyWbjAr8g1yOzZc=
> =TScO
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message