tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Kolinko <>
Subject Re: International characters in source files and SVN commit messages (was: RE:r1525975)
Date Thu, 26 Sep 2013 22:29:33 GMT
2013/9/26 sebb <>:
> On 25 September 2013 17:02, Konstantin Prei├čer <> wrote:
>> Mark,
>>> -----Original Message-----
>>> From: Mark Thomas []
>>> Sent: Wednesday, September 25, 2013 5:54 PM
>>> I'd say yes. Property files are a 'special' case:
>>> resource-properties-with-resourcebundle
>> OK, thank you for the clarification.
>>> It doesn't bother me but I'm only one committer. I think this falls
>>> under the category if someone cares enough about the commit e-mails
>>> using UTF-8 then they need to work with infra to make that happen. I'm
>>> happy with things as they are.
> There is a property that can be used to change the encoding used by
> the SVN mailer, for example:
> svn:mime-type text/xml; charset=utf-8
> Make sure this agrees with the contents and any xml encoding attribute.

-1 for changing svn:mime-type in such a way.
Placing an encoding into svn:mime-type is wrong, as
a) It is not portable. (Git does not have svn properties).
b) It is hard to keep in sync.  Beware that case may matter for some
software (UTF-8 vs utf-8).

( c) You may be relying on an undocumented feature. I remember some
long discussions several years ago on whether file encoding can be
part of svn:mime-type, or it should be a separate property, with no
clear outcome.

Regarding whoweare.xml file,  you need to add explicit encoding to the
top of the file (like it is done in
tc7.0.x/trunk/webapps/docs/changelog.xml).  Without that I consider
those files as ISO-8859-1, like the rest of our sources.

I think commit mailer should treat the files as ISO-8859-1, as such
interpretation does not lose any data and as that is the format of
unified diff.

In the past there were several cases when accented characters in
Tomcat's changelog files were corrupted during editing (due to a
conversion done in someone's editor). It was seen in commit message.
Last time it happened two or three years ago.

As of now, several xml files in Tomcat (those changelogs) are
officially UTF-8, and I am OK with people using accented characters
for new text there until something breaks.
(Personally, I will probably still use numeric entities, as I do not
have those characters on my keyboard.)

AFAIK, TortoiseSVN diff viewer has some logic to autodetect the use of UTF-8.

Best regards,
Konstantin Kolinko

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message