cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <g...@tuffmail.com>
Subject Re: request encoding conundrum
Date Fri, 25 Jul 2008 12:54:57 GMT
Jeremy Quinn pisze:
>>> I am trying to solve a nasty request transcoding bug, that I found 
>>> while working on CForms.
>>
>> Join the club! Discovered character encoding problems two days ago in 
>> a project based on Cocoon 2.1.x. Tried to fight it yesterday and gave up.
> 
> You work with 2.1 ?? I am shocked :)

Stay cool, it's only because this project is going to be migrated to 2.2. Actually Mavenization
and 
migration to 2.2 is my main job here.

What about you? Have you already become convinced to Cocoon 2.2? Have you got it running and
can you 
develop on top of it?

> A change like this while simplifying our codebase, could cause utter 
> havoc to users ..... I don't know if unicode really is a practical 
> superset of every other possible encoding.
> 
> Sorry, I do not think I know enough about this either.

Ok. Anyway just for record what wikipedia says[1] about UTF-8:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for
Unicode. 
It is able to represent any character in the Unicode standard, yet the initial encoding of
byte 
codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons,
it 
is steadily becoming the preferred encoding for e-mail, web pages, and other places where
characters 
are stored or streamed.

So it can represent anything from Unicdoe, let's have a look at Unicode[2] itself:
In computing, Unicode is an industry standard allowing computers to consistently represent
and 
manipulate text expressed in most of the world's writing systems. Developed in tandem with
the 
Universal Character Set standard and published in book form as The Unicode Standard, Unicode

consists of a repertoire of more than 100,000 characters [...]

If Unicode can handle 100 000 of characters then I guess anyone will have a hard times to
find any 
character not correctly encoded by Unicode.

> Yes, I was expecting that.
> Upgrading CForms upload widget is on my long list ..... I guess you just 
> bumped it forward a few places :)

Nice. :)

Still I'm interested in your work on CForms especially when it comes to the /server/ side
where I 
feel quite comfortable. Even I'm busy with my work here at my company and I have some other
Cocoon 
stuff to do I would like to support you on your effort.

I see only two small obstacles:
1. As I have already seen it at ApacheCon you have some nice work in your computer. The problem
is 
that if you keep it on your computer then nobody can test it and eventually help you with
this 
stuff. Any reason to not commit your work that you already have to some public place?
Otherwise any collaboration is rather difficult.
2. I prefer to work with C2.2 (trunk) because it's simpler than 2.1 and it's much easier to

develop/test anything here. Any chances that you will switch with your work to trunk?

>> There is even bug report about this issue:
>> https://issues.apache.org/jira/browse/COCOON-1917
>>
>> Another interesting option would be to replace our own handling of 
>> multipart requests with commons-upload code, see:
>> https://issues.apache.org/jira/browse/COCOON-1325
>>
>> What do you think about the last proposal?
> 
> I need a bit of time to dig into this .....
> 
>> Now I'm going to test fix proposed by you...

I've tested it (combined with fix from COCOON-1917) and on the server side everything looks
correct 
now. The only problem is that browser sometimes does not behave correctly.

I noticed that sometimes when I enter non-latin characters to the text field they get escaped
by a 
browser.

So when I enter something like:
światło

the browser posts to the server such value:
&#347;wiat&#322;o

(additionally there is parameter: dojo.transport=xmlhttp)

Since I don't know how these things are handled on the client side I'm not sure how to fix
it.

Any ideas?

> Many thanks!

You welcome!

-- 
Grzegorz Kossakowski

Mime
View raw message