tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Thomas <>
Subject Re: International characters in source files and SVN commit messages (was: RE:r1525975)
Date Wed, 25 Sep 2013 15:01:21 GMT
On 25/09/2013 07:52, Konstantin Preißer wrote:
> Hi all,
>> -----Original Message----- From:
>> [] Sent: Tuesday, September 24, 2013
>> 9:11 PM
>> --- tomcat/site/trunk/xdocs/whoweare.xml (original) +++
>> tomcat/site/trunk/xdocs/whoweare.xml Tue Sep 24 19:10:44 2013 @@
>> -100,6 +100,9 @@ A complete list of all the Apache Commit 
>> <p><b>Costin Manolache</b> (costin at<br/></p>
>> bio goes here-->
>> +<p><b>Konstantin Preißer</b> (kpreisser at<br/></p>
> When editing the whoweare.xml, I wrote the "ß" character (sharp s)
> which is now displayed as "ß" in the commit message, because the
> source XML file is encoded in UTF-8 (the default encoding for XML
> files).
> As far as I understand, SVN needs to treat changes in text files at
> byte-level, not at character-level, to be independent from character
> encodings. Therefore e.g. ".patch" files don't have a character
> encoding as they describe changes at byte-level.
> However, when the Commit E-Mail is sent, the bytes need to be
> converted to characters, and it seems the SVN commit diff is
> interpreted as ISO-8859-1 (or Windows-1252). Therefore, the UTF-8
> bytes 0xC3 0x9F are displayed as "ß", instead of "ß".
> That would be the preferred way to handle such issues? One way I can
> think would be to XML-encode such characters ("ß" as "&#xDF;").
> However, personally I would rather not do this, but write such
> characters directly ("ß"), so that the source is better readable (and
> encodings like UTF-8 guarantee that the characters are interpreted
> the same on each system, independently from the system language or
> geographic location).

I don't like the idea of using XML encoding at all.

> Could it be possible to change SVN Commit E-Mail system so that it
> may interpret diffs as UTF-8 instead of ISO-8859-1 (assuming all
> files which contain bytes > 0x7F are encoded as UTF-8)? (Or, that it
> tries to decode it as UTF-8, and if it fails, decode it as ISO-8859-1
> ?)

This is a question for infra. If UTF-8 fails then ISO-8859-1 is going to
fail as well.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message