cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc van Kempen <>
Subject Re: HTML editor widget (was Re: [proposal] Doco)
Date Fri, 31 Oct 2003 21:54:58 GMT
Bruno Dumon wrote:

>* different users of the widget (like the doco project vs the project
>where we need it) will likely require different subsets of HTML to be
>* support for both Mozilla and IE is important. Other browsers should
>fall back to a textarea with raw HTML in it.
>* the HTML produced by the editor should be cleaned (i.e. not supported
>tags & attributes removed) and normalized (formatted). The goal of this
>is to deliver a nice XHTML-subset-doc for storage, and to show nice HTML
>to people editing it manually. Hopefully this will also make it possible
>to do meaningful text-based diffs.
I have done some work on this. I have first written a js html editor for 
IE (>5.5) to be used in an XML content management system. For this we 
needed to clean the html and convert it to xhtml in order to be able to 
process it with xslt upon displaying pages.

One approach that I've tried is to generate the xhtml from the browser 
dom page with javascript, i.e. walk the tree and recursively generate 
<TAG> ... </TAG> entries, while surrounding all attributes with quotes. 
This could then be postprocessed on the server by parsing it with an XML 
parser and manipulating the DOM tree. This however proved to be a slight 
nightmare due to js/dom bugs in IE 5.5, if you'd be willing to drop 5.5 
support it would be easier, but it might also be possible to do this 
using more specific IE js constructions with which I'm not particular 

Eventually we ended up doing this completely server side, I wrote one 
component to fix the html to be xhtml and after that I use an XML parser 
to remove all unwanted attributes and tags.

The biggest problem while handling the html is that you also have to 
parse Word html that is pasted into the editor, and the html that Word 
produces is truly gruesome!

While the server side solution works well for all html garbage that I 
have encountered until now, it is not completely satisfactory because 
when you paste the html into the editor you're looking at the 
unprocessed html, when it has been processed by the server a lot will 
have been removed and it can look rather different. One could try to 
explain this to the user, but it's better to filter the html directly 
after pasting it, so the user will not get confused.

I'm now in the process of writing an editor component that can handle IE 
and Mozilla. It is in a working state, but the code needs to cleaned and 
some stuff needs to be written (a table editor, a url editor, etc.), it 
is however for a closed source system. I could discuss it to see if we 
would be willing to release it as open source.

>My first thought was to do this cleanup stuff serverside (could be as
>simple as an XSL, which would make it easily customisable too). However
>it seems like you want to do all that on the client side?
This won't work, you need valid xml to use xsl, and the IE html in 
particular can be very troublesome to fix.

>* Currently in e.g. Linotype the source for the editor (thus of the
>iframe) is fetched separately from the main page. This is harder to do
>with cforms since then the pipeline from which the content is fetched
>should also have access to the cforms Form which is stored somewhere in
>a variable in a flowscript. For the cforms widget it would be easier I
>think to embed the HTML directly in the page (e.g. as a Javascript
>variable). This also makes it possible to assign the content either to
>the html editor or the textarea depending on what the client supports.
>* Automatic image upload: still need to think more about this. After
>pressing the submit button (and afterwards possibly showing the form
>again), the images will need to become available in the URL space. How
>that's done will probably differ from application to application so we
>could put that behaviour behind an interface.

This is an interesting problem, Stefano talked about embedding it into 
the document, how would you want to do this? That would be the best 
solution for an embeddable component!

>* wiki syntax support: we have no need for this, so don't expect any
>effort from me on that.


View raw message