pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Choosing a font for non-ASCII characters
Date Tue, 19 Mar 2019 21:08:45 GMT
Hash: SHA256


On 3/19/19 16:23, Tilman Hausherr wrote:
> Am 19.03.2019 um 19:45 schrieb Christopher Schultz: Tilman,
> So I'm starting to look toward making my code better now that it's 
> actually working. Right now, my code looks like this:
> if(!isAnsiEncoding(strippedText)) { font = getFullUnicodeFont(); }
> Where one font simply replaces the other for strings that aren't 
> available the the built-in font(s).
> I'd like to support emoji and stuff like that. I can find a font
> (or fonts) for that, but I think the only way I can do that with
> the existing API is something like this:
> Font[] fonts = new Font[] { builtIn, arialUnicode, emoji };
> for(Font font : fonts) { try { page.setFont(font); 
> page.showText(text); } catch (IllegalArgumentException iae) { //
> Try the next font } }
> That will "work" but it will not work if, for example, I need to
> print text that includes both Chinese characters (from arialUnicode
> font) and also emoji (from the hypothetical "emoji" font).
> If there any way to tell PDFBox to "pick the right font (from some 
> list) for each character"?
>> No, that is why I created the EmbeddedMultipleFonts.java example
>> which I mentioned earlier in the thread. That one can switch
>> within strings.

Right, it basically does the same thing as I have above, but for a
bunch of increasingly-widening substrings, and it uses exceptions for
flow control. Yuck.

I'd have to look more into what PDFont.encode does, but I'm guessing
that it wouldn't be too hard to build methods into the PDFFont class
that look something like this:

 * Returns true if this PDFont can render the whole string.
public boolean canEncode(String s);

 * Returns the longest String that can be successfully encoded by this
 * PDFont, beginning at the beginning of {s}. If the whole String {s}
 * is encodable, then {s} will be returned. If only a part of {s}
 * is encodable, then the return value of this method will be such that:
 *       s.startsWith(getLongestEncodablePrefix(s)) == true
 * If the first character of the string is not encodable in this PDFont,
 * an empty string (or null?) will be returned.
public String getLongestEncodablePrefix(String s);


If this must be implemented initially by using exceptions for
flow-control, so be it. But theoretically, it can be improved in the
future by performing faster checks... possibly by each type of PDFont
subclass in a different way.

- -chris
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/


To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message