cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: svn commit: r278641 - /cocoon/branches/BRANCH_2_1_X/src/blocks/xsp/java/org/apache/cocoon/components/language/markup/xsp/XSPExpressionParser.java
Date Mon, 05 Sep 2005 16:38:00 GMT
Niclas Hedhman wrote:
> On Monday 05 September 2005 14:43, Antonio Gallardo wrote:
> 
> 
>>Of course that I am aware that both codesets (Shift-JIS and ISO-8859-1) are
>>different UNICODE subset. This is same as you stated. 
> 
> 
> No. Pier doesn't mix the difference between Unicode (sequence of characters) 
> and the mapping of those characters to fixed or variable length encoded 
> bytestreams.
> The fact that character 65 in Unicode is in many encodings mapped to the byte 
> value 65 is for convenience only, and that fact should be ignored.
> 
> 
>>Our SVN uses UTF-8 as the default charset (or encoding) or not?
> 
> 
> Subversion uses binary data, and is agnostic to any encodings in the data (or 
> so they say). AFAIU, marking files as text only deals with the line endings 
> and how the diff mails are generated.
> The --encoding argument applies to commit messages.
> Paths, URLs/URIs has additional encoding requirements.

Correct.

And is also worth noting that SVN before 1.2 and CVS2SVN create a pretty 
broken combination when the commit message in CVS used an encoding that 
was not UTF-8.

As an example, try to get svn log of the apache repository and the svn 
client will fail, because we have three commit messages in latin-1 
placed, as binary, by cvs2svn into svn (and prior to 1.2 there was no 
encoding validation checking in svn) that get moved into the XML file 
that is passed between the svn server and client, which is using UTF-8 
as the encoding.

I've asked infra@ to fix this, but being not really high priority (only 
data archeologist like myself care about those things) it is unlikely to 
get fixed.

Anyhow, I agree with Pier, we should *only* use ASCII and escape unicode 
characters explicitly the \uxxxx way.

-- 
Stefano.


Mime
View raw message