tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <>
Subject Re: International characters in source files and SVN commit messages (was: RE:r1525975)
Date Fri, 27 Sep 2013 01:35:48 GMT
On 26 September 2013 23:29, Konstantin Kolinko <> wrote:
> 2013/9/26 sebb <>:
>> On 25 September 2013 17:02, Konstantin Prei├čer <> wrote:
>>> Mark,
>>>> -----Original Message-----
>>>> From: Mark Thomas []
>>>> Sent: Wednesday, September 25, 2013 5:54 PM
>>>> I'd say yes. Property files are a 'special' case:
>>>> resource-properties-with-resourcebundle
>>> OK, thank you for the clarification.
>>>> It doesn't bother me but I'm only one committer. I think this falls
>>>> under the category if someone cares enough about the commit e-mails
>>>> using UTF-8 then they need to work with infra to make that happen. I'm
>>>> happy with things as they are.
>> There is a property that can be used to change the encoding used by
>> the SVN mailer, for example:
>> svn:mime-type text/xml; charset=utf-8
>> Make sure this agrees with the contents and any xml encoding attribute.
> -1 for changing svn:mime-type in such a way.
> Placing an encoding into svn:mime-type is wrong, as
> a) It is not portable. (Git does not have svn properties).

There are other svn properties that are required, so that does not make sense.

> b) It is hard to keep in sync.  Beware that case may matter for some
> software (UTF-8 vs utf-8).

How often does the encoding change?

> ( c) You may be relying on an undocumented feature. I remember some
> long discussions several years ago on whether file encoding can be
> part of svn:mime-type, or it should be a separate property, with no
> clear outcome.


> )
> Regarding whoweare.xml file,  you need to add explicit encoding to the
> top of the file (like it is done in
> tc7.0.x/trunk/webapps/docs/changelog.xml).  Without that I consider
> those files as ISO-8859-1, like the rest of our sources.

The default for XML is UTF-8.

> I think commit mailer should treat the files as ISO-8859-1, as such

XML is UTF-8 by default

> interpretation does not lose any data and as that is the format of
> unified diff.

Not sure about those last two assertions.

> In the past there were several cases when accented characters in
> Tomcat's changelog files were corrupted during editing (due to a
> conversion done in someone's editor). It was seen in commit message.
> Last time it happened two or three years ago.

That may be so, but I'm not sure what bearing that has on the svn
commit message encoding.

> As of now, several xml files in Tomcat (those changelogs) are
> officially UTF-8, and I am OK with people using accented characters
> for new text there until something breaks.
> (Personally, I will probably still use numeric entities, as I do not
> have those characters on my keyboard.)
> AFAIK, TortoiseSVN diff viewer has some logic to autodetect the use of UTF-8.
> Best regards,
> Konstantin Kolinko
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message