pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Choosing a font for non-ASCII characters
Date Wed, 20 Mar 2019 07:55:54 GMT
Am 19.03.2019 um 22:08 schrieb Christopher Schultz:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Tilman,
>
> On 3/19/19 16:23, Tilman Hausherr wrote:
>> Am 19.03.2019 um 19:45 schrieb Christopher Schultz: Tilman,
>>
>> So I'm starting to look toward making my code better now that it's
>> actually working. Right now, my code looks like this:
>>
>> if(!isAnsiEncoding(strippedText)) { font = getFullUnicodeFont(); }
>>
>> Where one font simply replaces the other for strings that aren't
>> available the the built-in font(s).
>>
>> I'd like to support emoji and stuff like that. I can find a font
>> (or fonts) for that, but I think the only way I can do that with
>> the existing API is something like this:
>>
>> Font[] fonts = new Font[] { builtIn, arialUnicode, emoji };
>>
>> for(Font font : fonts) { try { page.setFont(font);
>> page.showText(text); } catch (IllegalArgumentException iae) { //
>> Try the next font } }
>>
>> That will "work" but it will not work if, for example, I need to
>> print text that includes both Chinese characters (from arialUnicode
>> font) and also emoji (from the hypothetical "emoji" font).
>>
>> If there any way to tell PDFBox to "pick the right font (from some
>> list) for each character"?
>>
>>
>>> No, that is why I created the EmbeddedMultipleFonts.java example
>>> which I mentioned earlier in the thread. That one can switch
>>> within strings.
> Right, it basically does the same thing as I have above, but for a
> bunch of increasingly-widening substrings, and it uses exceptions for
> flow control. Yuck.
>
> I'd have to look more into what PDFont.encode does, but I'm guessing
> that it wouldn't be too hard to build methods into the PDFFont class
> that look something like this:
>
> /**
>   * Returns true if this PDFont can render the whole string.
>   */
> public boolean canEncode(String s);
>
> /**
>   * Returns the longest String that can be successfully encoded by this
>   * PDFont, beginning at the beginning of {s}. If the whole String {s}
>   * is encodable, then {s} will be returned. If only a part of {s}
>   * is encodable, then the return value of this method will be such that:
>   *
>   *       s.startsWith(getLongestEncodablePrefix(s)) == true
>   *
>   *
>   * If the first character of the string is not encodable in this PDFont,
>   * an empty string (or null?) will be returned.
>   */
> public String getLongestEncodablePrefix(String s);


That would just push what you called "Yuck" further downwards, or we 
would have to maintain code twice, one for checking whether something 
can encoded, and one for actually doing it. And this for all the 6, 
maybe 7 font types.

Instead of going forward with your project with the working code 
provided, you're arguing about design issues.

Tilman



>
> WDYT?
>
> If this must be implemented initially by using exceptions for
> flow-control, so be it. But theoretically, it can be improved in the
> future by performing faster checks... possibly by each type of PDFont
> subclass in a different way.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyRWlwACgkQHPApP6U8
> pFh9ThAAoHG1hK2SnjLv0ibDvZaG3ZI79NAgoIz7+bowPbi4BvPfKYfuubF0QSNH
> l2lvk657H+0PDFUU5UepyB4JsjItXKG3sgNbQBB0E+G84PF896M/3r61TMgTKmT4
> 1pEqkHMXJoBA/4/Gnh9HLMGyKTY623R60Jhgsxocm78KR4zSjiZuvLpWsSvrqC57
> 4vR4YZ8Od4FvC0NFiGrI4w7KCpRvhT15IiOS77Qitgm3CMTyDaOulcjrcQx2rk0B
> sZY5q+S2huG8INR2vqjjkA/iQjJOTvI7hGJco/PemKWZm6x0/NmATeA7bSYZ9FZ/
> ylJgahUKyCh2b/iJG5oOl/7iuFKrBpeO95/KO0ETTgrM/dZLbNnvDqQsdAfBOZYv
> MTzqk36rf7vMUZtr4i9XW4la4tol5MZTidUGJBgryhaE4VQDrfsnpI3R78LKJA2a
> +QHVLGA5N/fnCyG9/sxX3dwr3+K4daZ56YZJrkaqoO/IU95eQu8sFdATI++4uwsm
> JcWGbmK6O7RiljwqrggTJaU49BuPgnj1+RbIxBkovGEM5ReITomqZn5wsUnowbiE
> jVxSAavZ7OU8TlT+/bjFKWoV+wTvzGad671vPxt/Dy+++BFiGScVDwLM8qVmcrd1
> gf8BosKaVBHE/+YBw1wyYyYJowvrtr7T9gMMyIHG91fZiSv8Ml4=
> =6hcu
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message