From users-return-11525-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Tue Mar 5 06:22:41 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id BB7BE180648 for ; Tue, 5 Mar 2019 07:22:40 +0100 (CET) Received: (qmail 71770 invoked by uid 500); 5 Mar 2019 06:22:39 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 71752 invoked by uid 99); 5 Mar 2019 06:22:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2019 06:22:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 985EEC2735 for ; Tue, 5 Mar 2019 06:22:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.001 X-Spam-Level: * X-Spam-Status: No, score=1.001 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id SdrDS9oSUM2J for ; Tue, 5 Mar 2019 06:22:36 +0000 (UTC) Received: from mailout12.t-online.de (mailout12.t-online.de [194.25.134.22]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E247D610D6 for ; Tue, 5 Mar 2019 06:22:35 +0000 (UTC) Received: from fwd41.aul.t-online.de (fwd41.aul.t-online.de [172.20.27.139]) by mailout12.t-online.de (Postfix) with SMTP id D19F64185FA0 for ; Tue, 5 Mar 2019 07:22:29 +0100 (CET) Received: from [192.168.2.108] (SsHhc4ZdQh3hMbpQjVppkywO7NG30zailgVlIZb8KhmzziHN5DyD8PtlJXhJj7Ywce@[84.151.182.1]) by fwd41.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1h13T5-2Bh5Jw0; Tue, 5 Mar 2019 07:22:23 +0100 Subject: Re: Choosing a font for non-ASCII characters To: users@pdfbox.apache.org References: <76b65cad-bef6-946f-2539-41332ecba18b@lehmi.de> <1ea68adb-11c5-d3f6-9f58-67c1d5d1df3c@christopherschultz.net> <1173962a-f9a3-4e2e-ae10-dc0f467a83ba@t-online.de> <0B8BE5B1-54E7-4CE6-8D91-4FCC59E565E2@texture.com> <010101693efa9d86-8a889543-37a1-4aec-97d6-29d188c50b20-000000@us-west-2.amazonses.com> <212186df-492c-a433-437b-d63edd92f7c6@christopherschultz.net> From: Tilman Hausherr Message-ID: <25eebe11-fc00-aa97-16f3-c890e411ea53@t-online.de> Date: Tue, 5 Mar 2019 07:22:20 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.2 MIME-Version: 1.0 In-Reply-To: <212186df-492c-a433-437b-d63edd92f7c6@christopherschultz.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-ID: SsHhc4ZdQh3hMbpQjVppkywO7NG30zailgVlIZb8KhmzziHN5DyD8PtlJXhJj7Ywce X-TOI-MSGID: b4c1c52a-be09-4c54-a17a-0d70773d987a Am 04.03.2019 um 20:44 schrieb Christopher Schultz: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Tilman, > > On 3/3/19 08:48, Tilman Hausherr wrote: >>> I have no idea. The information about PDFBox seems to be mostly >>> in example programs and not web-based documentation. Searching >>> e.g. Google for "how to use FontBox with PDFBox" generally comes >>> up with references into the Javadoc for "uses of FontBox >>> interface". >>> >>> The Javadoc does not describe what FontBox is and none of the >>> classes or subclasses in those related packages really have any >>> documentation worth reading. Each class "foo" is described as >>> "being a foo" and each "getBar" method is described as "gets the >>> bar for the foo". >>> >>> So... discoverability of features is pretty much nil here. >>> >>> I'm quite happy with the responses I get on this mailing list, >>> but it's nearly impossible to discover on my own what is >>> possible, here. I shouldn't have to get you guys to tell me how >>> to use the software... you have better things to do (like >>> continue to write great software). >>> >>> Is there a good example of using FontBox with PDFBox in order to >>> subset a font? >> Yes, the EmbeddedFonts.java example. > I don't see any use of FontBox in the EmbeddedFonts.java example. Am I > missing something? Sorry, I meant under the hood it is using fontbox. EmbeddedVerticalFonts.java uses fontbox directly but only to open the font and access some special features. You don't need to bother about subsetting yourself. PDFBox does this for you. If you want to know how it is done, see TTFSubsetter.java and its usages, i.e. all implementations of Subsetter.java (TrueTypeEmbedder, PDTrueTypeFontEmbedder, PDCIDFontType2Embedder). > :) > > It's less of a presence of useless documentation and more of a lack of > existing documentation. I can file some tickets if you think it would > be helpful. I also don't mind writing documentation and/or tutorials > for the project. Try to start with something small. I try to concentrate on javadoc improvements and having working examples. A tutorial, to be complete, would also need to introduce people to PDF concepts. An example has the advantage that it works immediately even if they don't know anything about the PDF specification and PDF operators and content streams and the PD and COS model - people just need to adjust the example it to their needs. > >> The subset thing is done by PDFBox without you having to bother >> about it. It's "not subsetting" that would require more parameters. >> So you need only this: >> >> PDType0Font font = PDType0Font.load(document, new >> File("c:/windows/fonts/arial.ttf")); stream.setFont(font, 12); >> stream.showText("..."); > Okay, that's exactly what we are doing (well... we are loading the > font via the ClassLoader, but ...). And it's working. I was just a > little worried about the ballooning file size. I realize there is > little to be done about that at this stage. It will grow if you open the same font several times for one PDF file. Of course it will also grow if you use many different glyphs. But seriously, protesting because a file grows from 1 KB to 18 KB? Your file is only 18 KB! that is what counts. That is still small. Most PDF invoices I get are > 100KB. > > At this point, I am basically doing this: > > [ When adding text to the document ] > - - If the text contains anything outside of the ANSI encoding > - then replace the usual (default) font with the ARIALUNI.TTF > > It operates on a per-text-string basis, so it should only change the > font for a single piece of text that requires it. I'm starting to > think that I should not bother scanning the text and instead use the > IllegalArgumentException as flow-control -- which I still don't like. > But it means that my code will not spend a ton of time repeating > checks that PDFBox will end up doing, anyway. > > I'm a little worried about what I will do the next time I have an > issue like this -- where the ARIALUNI.TTF font doesn't include some > character that I need... since there's no way to probe a font for > support for a code point, I can't map code-points to fonts in a > scalable way. It will just be trial-and-error which is no fun. Know your clients. Have enough fonts for different languages. Use the multifont example I mentioned (EmbeddedMultipleFonts.java). I assume there is a way to ask a font whether a codepoint is supported... You could use TrueTypeFont.hasGlyph(name). The name is the postscript name of that glyph. But I don't know if this works for characters not in the adobe glyphlist. Tilman > > It also means that I need to have some kind of set of fonts that we > just round-robin through, hoping we get a hit and we can continue... > otherwise we just have to fail (like we do now). > > - -chris > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlx9gDYACgkQHPApP6U8 > pFia0w/+LSFIJCLtol+WZDMpcjTxI1Y4ulUFmRJxd+ZdGzbCrKss2R3p+J6VGZ0w > SZWAUQqg48FoVu4kh3fp4j9mz9eqprF9rmZiEPqGJKtsUPnpMTd3SA6Xt2eucY3O > VMOEbsy66/wC3DwgIgQdrrDfuRWsvmLkE6WyvkJpf1+sDIgFkSoD57y3YpHQdB4/ > o6+WXg1FSVjQAiND/XYAGZUHmV2o5JGFJVJJNlnmC6m11j/0zZvv4ZS1v3NX4DS1 > n9cwHtTEUxcz73AGzUo9A0QLfsPgEMEF8akbaLfA4UekZ0lZLCFXA36aP62KaI6b > ICo1/qF7eEOC1XpdCZS2JWpjMQn83q2kvuIooTEyHXjOT8t27f0+455e3PgYuLkh > kV9xMutmkJxXKv5VO3ohTmDWydQiwt/90M9ToTKonGeYWXTEEWzHpHr6BD95/2rZ > +yAbY3S0vTb1J0uQmlDaK6dd1pU+SSMxIV6Gi1tYi1kMVboiiQAMxJ9eqEhjt21+ > W3x4oGPLUoJ6q1TSTh0BOnXVnEUeci/Srbp+GWXvhmXtVC5H9V6dggb94yaKI3nC > KLW+87OYaU+Pd4GQNMI+2KipGAbeQ/8OhHEq63cFoKLzhKk/V/50w3Bo9/CLGyZ3 > W0E7lAZWV5cnu/AoKHC9KdSIPf+Qn6c//CtDmyWbjAr8g1yOzZc= > =TScO > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org