cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gelo1234 <gelo1...@gmail.com>
Subject Re: [2.1] Overzealous escaping of high Unicode code points
Date Wed, 21 Jun 2017 07:26:55 GMT
Hi Chris,

I suppose you cannot use 2 different encodings in 1 Serializer, so if you
changed
your Serializer config to be UTF16, you also have to use _external_ UTF16
encoded
CSS styles. Of couse you can define many different Serializer configs per
each pipeline.

By default common-lang/cocoon uses 2-byte char sequence as encoding base.
If you had UTF-8 and 32 bits, you would have 4 chars (each 8 bits), encoded
as 1 PAIR 2-bytes sequence.
if you switched to UTF-16, you would have 2 chars (each 16 bits), encoded
as 1 SINGLE 4-bytes sequence.

Greetings,
Greg


2017-06-20 22:14 GMT+02:00 Christopher Schultz <chris@christopherschultz.net
>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Greg,
>
> On 6/20/17 4:11 PM, Christopher Schultz wrote:
> > Greg,
> >
> > On 6/8/17 2:17 PM, gelo1234 wrote:
> >> Chris,
> >
> >> Even with C3 (cocoon 3.0 beta) unless you specify optional
> >> encoding in your Serializer config, you fallback to default
> >> UTF-8:
> >
> >> org.apache.cocoon.optional.servlet.components.sax.serializers.util
> >
> >>  public class ConfigurationUtils {
> >
> >> private ConfigurationUtils() { }
> >
> >> public static String getEncoding(Map<String, ? extends Object>
> >> configuration) { String encoding = (String)
> >> configuration.get("encoding");
> >
> >> if (encoding == null || "".equals(encoding)) { encoding =
> >> "UTF-8"; }
> >
> >> return encoding; } ...
> >
> > I would have expected the Unicode codepoint to be converted into a
> > single 4-byte UTF-8 byte without any &-encoding at all. It looks
> > like what I got was a pair of 2-byte characters with &-encoding.
> >
> > I'll try UTF-16 but my expectation is that it's going to get
> > worse, not better.
>
> Interestingly enough, my emojis are now showing (which I don't totally
> understand why!) but it looks like my CSS aren't being loaded. That's
> a separate problem I'll have to figure out for myself.
>
> In my own application, switching from commons-lang to commans-lang3
> HTML/XML escaping allowed me to use these 4-byte emojis and UTF-8
> together. I'm surprised that Cocoon can't do the same thing. (I think
> it comes down to exactly how the character-escaper makes its decisions).
>
> Thanks,
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAllJgiwACgkQHPApP6U8
> pFgJkRAAqiXn7DWNDN41m1V98aI5xWjTuoka0tKcadN1IUGemTZwipaXHtYQcois
> 6yuI3st31ZuanghIpRPcBu9pZzuHtOSBVSHZSIhDGqPwYgczScQ2LgnfMi6zwAdd
> j2LFlSWtKGjgCczV5Ok56PyMq1BEAOVw96vmF5xfXmpLAyNA/PvLKsncoW4pN+ES
> 1MQMm1aPwbmEpWz7ykReUzfauwBtL4rEX1wO3pl88m9Wq3x174AKHWs/a+4Z1Hdq
> 0CnxfrdTK50p7Ng+ECfnPwx8y1Em64lA7KKMuz2jTd0PnxlpZTAgO6lq8S7BdSeY
> H1lwBJojVT/+m2w8b9OC/XoyiAyiC/zIswQ3TSMA3ZC2SnCxxAXMTsmT49Ql+lyq
> 01JRCIVMitKeoKI4I4066oaBW91FpSSpZXX14XCHrMBtKnIJI+NxBnI++eQq8wdi
> ZdX3GzLF2zaPHvZMSz4DRskR1xKGLsAxZAukINW3AGrEAZ/GwbPd76ml3YJam5Yy
> R31u0kcRJl4z79pd1n46yxB66V10Rn5IkSMQ8R7uK/ht9wLi5T8bkeAoLjZFFoyq
> awmfQTbJzquXAtwjX99WKWEzviN2ph+P0h2rBInHnos5ud8IlLjcS7FmdxQ4DNOw
> Nirmj7cikxcr2Fn22pGQh6o3/Eph0lMf1d1HjUZ1C7SchEgsqrk=
> =0nTd
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>

Mime
View raw message