cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Heinicke <joerg.heini...@gmx.de>
Subject Re: Encoding problems, still!
Date Sat, 30 Oct 2004 08:13:56 GMT
On 30.10.2004 02:42, Marc Portier wrote:

That late? ;-)

>> But then in the bug report for Xalan (someone having this same 
>> problem) it says:
>>
>> "According to section 16.2 of the XSLT Recommendation [1], non-ASCII 
>> characters in URI attribute values should be escaped using the method 
>> recommended in Section B.2.1 of the HTML 4.0 Recommendation [2]. The 
>> latter recommends that non-ASCII characters be represented in UTF-8 
>> prior to applying the "%HH" escaping described by the URI RTF, 
>> regardless of the output encoding."
>>
> 
> nifty, didn't know... so whatever output encoding you set the uri's will 
> be utf-8 encoded, and then url-encoded?

Yes, that's how I understand it and wrote it in my first reply to 
Tuomo's question.

> haven't ever seen this, I was under the impression that to xalan 
> attributes were just attributes and would have expected characters to be 
> replaced by character-entity-refs depending on if they are supported or 
> not by the applied output-encoding

No, Xalan handles href attributes differently.

>> This is what Xalan does (HTML serialization), so it obeys the spec.
>>
>> Correct me if I'm wrong, but during serialization if there are special 
>> characters (above 128) in an URL:s request parameters (href-attributes 
>> etc.), they are first encoded in UTF-8 by Xalan. Even if the browser 
> 
> apparently, would like to see some test evidence to be on the safe side 
> though

I can confirm this behaviour for old versions of Xalan coming with 
Cocoon 2.0 RC 1. At that time we tried to produce links with request 
params and they did not work because of encoding. We had to change the 
links to some form.submit() javascript stuff.

>> detects the page as ISO-8859-1 or anything else, these URL:s in the 
>> HTML source contain parameters in UTF-8. Now, when user clicks on this 
>> link, 
> 
> but it is not about request-parameters is it?

It is as far as I understand.

> it is about the proper URL part, no?

Don't know exactly. Had no tests for URL part and request param part.

> as in:
> 
> http://server:port/path/more-path?request-param=value
> ---------------------------------|-------------------
>  >>  area-not-fixed-by-cocoon << |  >> area fixed by cocoon <<
> 
> (in fact I'm even doubthing if we are fixing the names of the 
> request-params (actually my guess would be we're only doing the values))
> 
> see 
> http://cvs.apache.org/viewcvs.cgi/cocoon/trunk/src/java/org/apache/cocoon/environment/http/HttpRequest.java?rev=55600&root=Apache-SVN&view=auto

> 
> there is the internal decode() method. it gets only called from areas 
> that do with request-parameter-values (as I started to think: not even 
> the names)
> 
>> Cocoon reads the request parameters in as ISO-8859-1, and converts 
>> them to UTF-8, without knowing that these parameters were already UTF-8!

That's how I understand it (just the first part is not done by Cocoon, 
but by the container as Mark wrote below too).

> nope, don't think so... first nuance (see above) the container reads
> and applies (typically) ISO-8859-1,...
> 
> and cocoon correctly re-encodes request-parameter-values based on its 
> 'form-encoding', but isn't (at least to my knowledge) touching the url 
> part of things

But if you convert values from ISO-8859-1 to UTF-8 though they already 
have been UTF-8 and not ISO-8859-1 you are in troubles like Tuomo, 
aren't you?

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message