pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Choosing a font for non-ASCII characters
Date Tue, 19 Mar 2019 21:08:45 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Tilman,

On 3/19/19 16:23, Tilman Hausherr wrote:
> Am 19.03.2019 um 19:45 schrieb Christopher Schultz: Tilman,
> 
> So I'm starting to look toward making my code better now that it's 
> actually working. Right now, my code looks like this:
> 
> if(!isAnsiEncoding(strippedText)) { font = getFullUnicodeFont(); }
> 
> Where one font simply replaces the other for strings that aren't 
> available the the built-in font(s).
> 
> I'd like to support emoji and stuff like that. I can find a font
> (or fonts) for that, but I think the only way I can do that with
> the existing API is something like this:
> 
> Font[] fonts = new Font[] { builtIn, arialUnicode, emoji };
> 
> for(Font font : fonts) { try { page.setFont(font); 
> page.showText(text); } catch (IllegalArgumentException iae) { //
> Try the next font } }
> 
> That will "work" but it will not work if, for example, I need to
> print text that includes both Chinese characters (from arialUnicode
> font) and also emoji (from the hypothetical "emoji" font).
> 
> If there any way to tell PDFBox to "pick the right font (from some 
> list) for each character"?
> 
> 
>> No, that is why I created the EmbeddedMultipleFonts.java example
>> which I mentioned earlier in the thread. That one can switch
>> within strings.

Right, it basically does the same thing as I have above, but for a
bunch of increasingly-widening substrings, and it uses exceptions for
flow control. Yuck.

I'd have to look more into what PDFont.encode does, but I'm guessing
that it wouldn't be too hard to build methods into the PDFFont class
that look something like this:

/**
 * Returns true if this PDFont can render the whole string.
 */
public boolean canEncode(String s);

/**
 * Returns the longest String that can be successfully encoded by this
 * PDFont, beginning at the beginning of {s}. If the whole String {s}
 * is encodable, then {s} will be returned. If only a part of {s}
 * is encodable, then the return value of this method will be such that:
 *
 *       s.startsWith(getLongestEncodablePrefix(s)) == true
 *
 *
 * If the first character of the string is not encodable in this PDFont,
 * an empty string (or null?) will be returned.
 */
public String getLongestEncodablePrefix(String s);

WDYT?

If this must be implemented initially by using exceptions for
flow-control, so be it. But theoretically, it can be improved in the
future by performing faster checks... possibly by each type of PDFont
subclass in a different way.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyRWlwACgkQHPApP6U8
pFh9ThAAoHG1hK2SnjLv0ibDvZaG3ZI79NAgoIz7+bowPbi4BvPfKYfuubF0QSNH
l2lvk657H+0PDFUU5UepyB4JsjItXKG3sgNbQBB0E+G84PF896M/3r61TMgTKmT4
1pEqkHMXJoBA/4/Gnh9HLMGyKTY623R60Jhgsxocm78KR4zSjiZuvLpWsSvrqC57
4vR4YZ8Od4FvC0NFiGrI4w7KCpRvhT15IiOS77Qitgm3CMTyDaOulcjrcQx2rk0B
sZY5q+S2huG8INR2vqjjkA/iQjJOTvI7hGJco/PemKWZm6x0/NmATeA7bSYZ9FZ/
ylJgahUKyCh2b/iJG5oOl/7iuFKrBpeO95/KO0ETTgrM/dZLbNnvDqQsdAfBOZYv
MTzqk36rf7vMUZtr4i9XW4la4tol5MZTidUGJBgryhaE4VQDrfsnpI3R78LKJA2a
+QHVLGA5N/fnCyG9/sxX3dwr3+K4daZ56YZJrkaqoO/IU95eQu8sFdATI++4uwsm
JcWGbmK6O7RiljwqrggTJaU49BuPgnj1+RbIxBkovGEM5ReITomqZn5wsUnowbiE
jVxSAavZ7OU8TlT+/bjFKWoV+wTvzGad671vPxt/Dy+++BFiGScVDwLM8qVmcrd1
gf8BosKaVBHE/+YBw1wyYyYJowvrtr7T9gMMyIHG91fZiSv8Ml4=
=6hcu
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message