cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Quinn <>
Subject Re: request encoding conundrum
Date Sat, 26 Jul 2008 12:24:14 GMT

On 25 Jul 2008, at 13:54, Grzegorz Kossakowski wrote:

> Jeremy Quinn pisze:
>>>> I am trying to solve a nasty request transcoding bug, that I  
>>>> found while working on CForms.
>>> Join the club! Discovered character encoding problems two days ago  
>>> in a project based on Cocoon 2.1.x. Tried to fight it yesterday  
>>> and gave up.
>> You work with 2.1 ?? I am shocked :)
> Stay cool, it's only because this project is going to be migrated to  
> 2.2. Actually Mavenization and migration to 2.2 is my main job here.


> What about you? Have you already become convinced to Cocoon 2.2?  
> Have you got it running and can you develop on top of it?

I still have all of the notes and the builds we did (thanks!).
But I am still doing the work in 2.1, as (if I remember properly) we  
did not manage to make a build that would edit live at the level of  
the cforms block itself.
Correct me if I am wrong, but it seems easier to setup 2.1 so that  
edits made to the built-in resources of the block are immediately live  
without re-building.

>> A change like this while simplifying our codebase, could cause  
>> utter havoc to users ..... I don't know if unicode really is a  
>> practical superset of every other possible encoding.
>> Sorry, I do not think I know enough about this either.
> Ok. Anyway just for record what wikipedia says[1] about UTF-8:
> UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length  
> character encoding for Unicode. It is able to represent any  
> character in the Unicode standard, yet the initial encoding of byte  
> codes and character assignments for UTF-8 is backwards compatible  
> with ASCII. For these reasons, it is steadily becoming the preferred  
> encoding for e-mail, web pages, and other places where characters  
> are stored or streamed.
> So it can represent anything from Unicdoe, let's have a look at  
> Unicode[2] itself:
> In computing, Unicode is an industry standard allowing computers to  
> consistently represent and manipulate text expressed in most of the  
> world's writing systems. Developed in tandem with the Universal  
> Character Set standard and published in book form as The Unicode  
> Standard, Unicode consists of a repertoire of more than 100,000  
> characters [...]
> If Unicode can handle 100 000 of characters then I guess anyone will  
> have a hard times to find any character not correctly encoded by  
> Unicode.

Yes, I know that is the 'party line' about unicode :)
But TBH, I don't know if it really covers every possible obscure case.

>> Yes, I was expecting that.
>> Upgrading CForms upload widget is on my long list ..... I guess you  
>> just bumped it forward a few places :)
> Nice. :)

... but I am still bogged down with subtle differences in format  
interpretation between Java and Dojo, with validating number fields,  
it's a minefield ...... blog entry half written ;)

> Still I'm interested in your work on CForms especially when it comes  
> to the /server/ side where I feel quite comfortable. Even I'm busy  
> with my work here at my company and I have some other Cocoon stuff  
> to do I would like to support you on your effort.

Great !

> I see only two small obstacles:
> 1. As I have already seen it at ApacheCon you have some nice work in  
> your computer. The problem is that if you keep it on your computer  
> then nobody can test it and eventually help you with this stuff. Any  
> reason to not commit your work that you already have to some public  
> place?

There are a few problems that have stopped me doing this so far :
1) too lazy (so far) to set up and maintain some kind of branch/ 
sandbox ;)
2) I cannot commit anything to head yet, because lots of stuff is  
still completely broken and/or still has to be re-written to the new  
APIs. The work has already taken me several months, and there are  
several more to go ..... it is unpredictable how much longer this will  
take, I'd mess up Cocoon's release cycles .....

> Otherwise any collaboration is rather difficult.

What would you propose?
The work involves having two or three custom blocks, forms and ajax  
(atm, I have dojotoolkit as a block).
If you are serious about getting involved, I'd be prepared to make the  
extra effort to collaborate.

> 2. I prefer to work with C2.2 (trunk) because it's simpler than 2.1  
> and it's much easier to develop/test anything here. Any chances that  
> you will switch with your work to trunk?

You find 2.2 simpler, I find 2.1 simpler :)
If we could find the right way to collaborate, you can work on 2.2- 
specific issues, and I can work on 2.1.

One of the major problems with 2.2 is the loss of the 'system  
pipelines' that in 2.1 provide a set of static URIs for loading cforms  
and dojo resources; coupled to the fact that /someone/  
misunderstanding dojo APIs thought it necessary to introduce a  
resource-path for use by cforms widgets client-side.

I can hopefully help you over-come these problems.

This is the current JS Loader for 2.1.12-dev :
<script src="/_cocoon/resources/dojotoolkit/dojo/dojo.js" type="text/ 
javascript" djConfig="isDebug: true, locale: 'en_GB', parseOnLoad:  
<script type="text/javascript">
dojo.registerModulePath("cocoon.forms", "../../forms/js");  
dojo.registerModulePath("cocoon.ajax", "../../ajax/js");  

(ignoring paths to css for now ....)

We have a system pipeline "/_cocoon/resources/ .... " which is used as  
a prefix to load dojo from the dojotoolkit block.

Then we register two modules, forms and ajax, using a path that is  
relative to where dojo was loaded from.

One point that was missed by the /someone/ above, was that once a  
module is registered, you can get a url to it like this :

var imgSrc = dojo.moduleUrl("cocoon.forms","images/blah.png");

i.e. it is not necessary to provide it specifically to the client as  
it is currently done : cocoon.resourcesUri = "<xsl:value-of  

But TBH, except for a few exceptions like custom data-source urls  
(dynamic selectionlists etc.) there should be no need to reference  
anything like this ..... templates should be embedded in widgets,  
images used in widgets should be loaded via css (where relative  
references work internally) etc. etc.

So, the system path is not available in 2.2. The dojotolkit, forms and  
ajax blocks could have any URI. So we need a standard way for an  
application block to tell it's form-rendering pipeline the paths to  
these blocks. Presumably this should be the responsibility of the  
application's sitemap.

It should not be necessary to re-write any URIs (!!).

Furthermore, this provision of paths to blocks, needs to take into  
account the fact that in production people will most likely want to do  
stuff like :
1) acquire dojo from CDNs like AOL, Google etc.
2) build custom minimised JS libs to support their apps
3) load their own custom modules, override css etc.
4) lots of stuff we have not thought of yet ;)

ATM, while I am developing cforms, my dojotoolkit block is a special  
build, everything uncompressed, unpackaged, etc. with like 180 sets of  
locales etc. etc. Some complex forms are loading over 100 separate  

The modularity of dojo (and by using dojo.require) means that only  
what is needed by a page is loaded, which is great. But in production,  
you will want to heavily reduce the number of files ..... specially  
the 404s you get 'hunting' the locale tree. It is a bit of a  
contradiction .....

I have not really begun to think seriously about how this should be  
done yet.

If we could collaborate on a way to cleanly solve this, so that  
ideally the basic technique is the same for both 2.1 and 2.2, that  
would be really useful for me :)

>>> There is even bug report about this issue:
>>> Another interesting option would be to replace our own handling of  
>>> multipart requests with commons-upload code, see:
>>> What do you think about the last proposal?
>> I need a bit of time to dig into this .....
>>> Now I'm going to test fix proposed by you...
> I've tested it (combined with fix from COCOON-1917) and on the  
> server side everything looks correct now.

Great !!!

> The only problem is that browser sometimes does not behave correctly.
> I noticed that sometimes when I enter non-latin characters to the  
> text field they get escaped by a browser.
> So when I enter something like:
> światło
> the browser posts to the server such value:
> &#347;wiat&#322;o

Yes, I see this a lot.
I also see UTF-8 encoding like this : %E2%82%AC (which is the 3 byte  
encoding for the Euro symbol).

I have not found this encoding to be a problem.
What problem does this cause you?

> (additionally there is parameter: dojo.transport=xmlhttp)

This is one of the standard parameters that CForms has to add to form  

CForms uses 3 different transports, depending on context:

1) ajax-off : normal whole page submit
2) ajax-on  : xmlhttp
3) ajax-on + form contains a 'file' field : iframe-transport

Unfortunately, the response to each of these needs to be serialized  
differently, hence the need to a very complicated sitemap for cforms  
and this special parameter.

> Since I don't know how these things are handled on the client side  
> I'm not sure how to fix it.
> Any ideas?

I need more details of what problem it causes ....


regards Jeremy

View raw message