tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: International characters in source files and SVN commit messages (was: RE:r1525975)
Date Fri, 27 Sep 2013 01:35:48 GMT
On 26 September 2013 23:29, Konstantin Kolinko <knst.kolinko@gmail.com> wrote:
> 2013/9/26 sebb <sebbaz@gmail.com>:
>> On 25 September 2013 17:02, Konstantin Prei├čer <kpreisser@apache.org> wrote:
>>> Mark,
>>>
>>>> -----Original Message-----
>>>> From: Mark Thomas [mailto:markt@apache.org]
>>>> Sent: Wednesday, September 25, 2013 5:54 PM
>>>
>>>> I'd say yes. Property files are a 'special' case:
>>>> http://stackoverflow.com/questions/4659929/how-to-use-utf-8-in-
>>>> resource-properties-with-resourcebundle
>>>
>>> OK, thank you for the clarification.
>>>
>>>> It doesn't bother me but I'm only one committer. I think this falls
>>>> under the category if someone cares enough about the commit e-mails
>>>> using UTF-8 then they need to work with infra to make that happen. I'm
>>>> happy with things as they are.
>>
>> There is a property that can be used to change the encoding used by
>> the SVN mailer, for example:
>>
>> svn:mime-type text/xml; charset=utf-8
>>
>> Make sure this agrees with the contents and any xml encoding attribute.
>>
>
> -1 for changing svn:mime-type in such a way.
> Placing an encoding into svn:mime-type is wrong, as
> a) It is not portable. (Git does not have svn properties).

There are other svn properties that are required, so that does not make sense.

> b) It is hard to keep in sync.  Beware that case may matter for some
> software (UTF-8 vs utf-8).

How often does the encoding change?

> ( c) You may be relying on an undocumented feature. I remember some
> long discussions several years ago on whether file encoding can be
> part of svn:mime-type, or it should be a separate property, with no
> clear outcome.

See http://opensource.perlig.de/svnmailer/doc-1.0/#groups-charset-property

> http://subversion.tigris.org/issues/show_bug.cgi?id=2329
> http://subversion.tigris.org/issues/show_bug.cgi?id=2194
> )
>
> Regarding whoweare.xml file,  you need to add explicit encoding to the
> top of the file (like it is done in
> tc7.0.x/trunk/webapps/docs/changelog.xml).  Without that I consider
> those files as ISO-8859-1, like the rest of our sources.

The default for XML is UTF-8.

>
> I think commit mailer should treat the files as ISO-8859-1, as such

XML is UTF-8 by default

> interpretation does not lose any data and as that is the format of
> unified diff.

Not sure about those last two assertions.

> In the past there were several cases when accented characters in
> Tomcat's changelog files were corrupted during editing (due to a
> conversion done in someone's editor). It was seen in commit message.
> Last time it happened two or three years ago.

That may be so, but I'm not sure what bearing that has on the svn
commit message encoding.

> http://svn.apache.org/r999983
> http://svn.apache.org/r1196769
>
> As of now, several xml files in Tomcat (those changelogs) are
> officially UTF-8, and I am OK with people using accented characters
> for new text there until something breaks.
> (Personally, I will probably still use numeric entities, as I do not
> have those characters on my keyboard.)
>
> AFAIK, TortoiseSVN diff viewer has some logic to autodetect the use of UTF-8.
>
> Best regards,
> Konstantin Kolinko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message