cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylvain.wal...@anyware-tech.com>
Subject [RT] Comparing Woody & XMLForm : towards a unified form handling (long)
Date Mon, 21 Jul 2003 18:44:01 GMT
Hi all,

Lately, I've been thinking a lot about form handling in Cocoon. The 
reason for this is that I will very soon start a project which is 
basically a large set of forms (about 40 different screens used to fill 
an XML document containing collections having up to 1000 or 2000 items). 
As part of our proposal for the project, I did some prototyping with 
XMLForm (+flowscript) and liked its lightweight markup and the strong 
separation it enforces between form definition and form layout. But I 
disliked its poor syntactical validation facilities. On the other side, 
we have Woody which is very good a validating data but which I find 
heavy to use and defines its own schema language. So this RT is my 
attempt to make a synthesis of the good and bad points of both 
frameworks, augmented with my own ideas, so that we can move towards a 
single unified form handling package in Cocoon.

Disclaimer : I don't want to start a war between Woody and XMLForm, but 
just try to analyze what we have today and expose what I (hence it's 
subjective) consider as good. Discussion is of course welcomed. Also, I 
may have missed some features of one or the other framework. In that 
case, please don't shoot at me, but be kind enough to explain what I 
missed !

Also, I'll speak about XMLForm, even if it's somewhat dead and replaced 
by JXForms (essentially a cleaner rewriting of the XMLFormTransformer 
and an update of the markup to the latest XForms draft), because all 
criticisms about XMLForm below come from the original XMLForm and not 
the JXForms work.

                           ---oOo---

General overview
----------------

Both Woody and XMLForm use the same basic principles :

1/ Content production : a form template is "instanciated", i.e. it is 
filled with values coming from a data model, and the instanciated form 
is transformed to the target language (e.g. HTML) using generic and/or 
custom stylesheets that know how to render the various widgets.

2/ Form validation : upon form submission, values are validated and 
stored into a data model, and violations are produced if some validation 
error occurs (validations involving several fields are also possible). 
In case of error, the form can be redisplayed with the violations.

But, as we will see below, the notions of form template, data model and 
validation are very different in Woody and in XMLForm.

                           ---oOo---


Form definition
---------------

Woody separates form definition, form template and form instance (3 
different namespaces). The form definition is a kind of schema language 
that defines every widget in the form with its label, datatype and 
validation constraints. The template contains references to form fields 
mixed with foreign markup (such as HTML). It is instanciated using the 
WoodyTransformer : every field present in the template is replaced by 
the corresponding instance acccording to the form definition.

Woody has no notion of application model, as it stores field values in 
it's own data structure, which must be read and written to the 
application model. Work is underway in this area with a JXPath based 
binding.

XMLForm has only one markup, inspired by the W3C's XForms specification. 
This markup is more or less equivalent to the Woody template (it accepts 
foreign markup), which is instanciated ("augmented" would be better) 
with either the XMLFormTransformer/JXFormsTransformer or the 
JXFormsGenerator. Form fields contain XPath references to the data 
model, which can therefore have an arbitrary complexity.

<my-opinion>
XMLForm is way easier to setup to produce forms : a single file, a data 
model containing any mixture of objects handled by JXPath (JavaBeans, 
DOM elements, etc), XPath expressions everywhere, and you're done. But 
as soon as there's a need for data whose formatting is more than 
toString(), such as dates and float values, and even more in an I18Nized 
environment, XMLForm shows strong limitations, mainly related to lack of 
proper formatting functions in XPath.

As JXPath supports extension functions, building a library of formatting 
functions can be a solution to circumvent XPath's reduced function set. 
But we'll see below that there's still a problem with parsing submitted 
form data.

Woody, on the other hand, is more complicated to set up, as two files 
are needed (form definition and form template), with many 
cross-references (field IDs). But Woody shines for complicated 
formatting (see <convertor> directives) and I18N.

IMO, Woody's separation of concerns between form definition and template 
is not that good. Woody would be easier to use if the definition file 
was only a schema defining datatypes and if fields were defined only in 
the template. Although there is a great probability that datatypes can 
be reused for different fields and even different forms, I'm not sure 
using the same fields within different templates really make sense. For 
example, HTML and WML browsers have so much different screen sizes and 
interaction constraints that a single form definition can hardly be used 
for both.

Reusing datatypes for different fields would also increase the overall 
application consistency : as of today, if two fields have the same 
datatype and constraints, these must be duplicated. This could also open 
the door to other schema languages (WXS, RNG, etc).
</my-opinion>

                           ---oOo---


Population and validation
-------------------------

"Population" is the term used to designate the action of "filling" the 
data model with form-submitted data. "Validation" is the action of 
controlling that submitted data is valid, i.e. that is satisfies some 
syntactic and semantic constraints.

Upon form submission, XMLForm traverses all request parameters and tries 
to set their value on the data model using JXPath. A feature allows to 
filter request parameters that are not part of the data model. If the 
data model was filled correctly, a validation is performed using 
Schematron. This allows to have finer-grained or inter-field controls, 
again using XPath expressions. Each of these two phases can produce 
violations, which are recorded in the Form object.

Upon form submission, Woody traverses the form's widget tree, and each 
widget is responsible to parse the corresponding request parameter and 
validate it's value. Non-visual widgets are also provided to perform 
inter-field controls.

<my-opinion>
Here again, XMLForm is very easy to use but shows some strong 
limitations : because it's designed after XForms, XMLForm has no feature 
to specify how to parse form parameters (strings) into strongly typed 
data. So even basic parsing of e.g. dates is not possible, and 
locale-dependent parsing is clearly not possible.

The Schematron validation has less restrictions since it deals with the 
populated data model, and thus on strongly typed data, if they could be 
parsed in the population phase.

XMLForm also has what I consider a strong security weakness : the 
default request parameter filter rejects only special parameters such as 
"cocoon-action-*", which means that a request can be hacked that 
modifies a part of the data model that wasn't available as a form field. 
Considering that programmers are lazy (as I am), the form model will 
often be the actual business object. The consequences of providing a 
form to a user to update her location information can be catastrophic if 
the User class contains "address", "phoneNumber", but also "accessRights"...

W3C XForms, which inspired XMLForm, is a client-side specification 
targeted at producing XML documents validated by a WXS (W3C XML Schema). 
But XMLForm is server-side, and doesn't enforce any particular schema 
language. This means that very few features of XForms are actually used 
except the form markup and that all has to be invented to produce a 
featured server-side form framework, particularily in this population & 
validation phase.

Woody, by traversing the widget tree that was used to produce the form, 
doesn't have the security weakness of XMLForm since only parameters 
present in the produced form are considered. Also, it's strong parsing 
and I18N features make custom formatting really easy.

But, being limited to the form's data model, complex validations 
involving form data and application data can be difficult to do with 
Woody and will need custom Java code.

Finally, Woody uses its own expression language, with IMO is not a good 
choice if we consider that "standard" expression languages such as Jexl 
exist and are already used in other Cocoon blocks.
</my-opinion>

                           ---oOo---


Mapping to the application data model
-------------------------------------

A form is useless if its content cannot be mapped in some way to the 
application data model.

XMLForm has no special provision for mapping form data to application 
data, but using JXPath makes it easy to fill any JavaBean or any DOM 
structure. Post-validation application behaviour can be added to either 
a subclass of AbstractXMLFormAction or in a flowscript.

Woody currently does not provide anything to map form data to 
application data and all this must be coded either in a subclass of 
AbstractWoodyAction or in a flowscript. But there's work underway to add 
binding features to Woody, the first incarnation being based on JXPath.

<my-opinion>
XMLForm makes it easy (as pointed out above) for the lazy programmer to 
set the application data as the form model : mapping is then immediate 
and totally transparent. But along with the security problem mentioned 
above, this also means that when a form population & validation fails, 
it is very likely that some fields already have been modified, 
potentially leaving the data model in an inconsistent state.

So the secure and clean solution is to use a form-specific data model (a 
JavaBean, DynaBean or XML DOM), but this requires then custom code to 
copy form data to the application data model, thus loosing the 
simplicity provided by JXPath.

The ongoing work on Woody binding potentially allows a great range of 
target data models : the current JXPath binding will make it easy to map 
form data to an abitrary data structure, without XMLForm's limitations 
since parsed and strongly typed data will be stored in the application 
model. But we can also imagine other declarative bindings targetted at 
e.g. relational databases (no intermediate bean), EJBs, etc.
</my-opinion>

                           ---oOo---

I18N
----

I18N features should be separated in two main areas :
- I18Nization of form labels and item values (i.e. combobox labels)
- I18Nization of textbox inputs, such as floating point numbers, dates, etc.

For the first item, both XMLForm and Woody accept any foreign markup in 
widget labels, including <i18n:*> tags for use with the I18NTransformer. 
Woody lacks the equivalent to <xf:help> but this was recently discussed 
and should be added soon. XMLForm also allows labels and similar items 
to have their content fetched from the form model using a "ref" 
attribute. In that case, however, only characters are produced, and not 
mixed content.

For the second item (i18nization of inputs), XMLForm has no support, as 
it hardly supports custom formats, as explained previously. Woody, on 
the other hand, has strong support for i18nization of inputs through its 
<convertor> tag that supports locale-specific patterns for formatting 
and parsing.

<my-opinion>
XMLForm's strong limitations for values formatting also apply to the 
i18n domain, whereas Woody not only provides strong support for value 
formatting, but also strong support for locale-dependend formatting.

XMLForm's "ref" attribute on form labels allows messages to be part of 
the form model, and thus be dynamic, but I'm not sure this is of real 
use. And if it is, Woody may be able to provide an equivalent through 
nested tags in the <wd:label> element.
</my-opinion>

                           ---oOo---

Conclusion
----------

XMLForm has a lot of success because it has filled a giant need in 
Cocoon applications to handle forms. Moreover, it fits nicely with 
flowscript, and this combination builds an easy to use solution for form 
handling. But using it in more and more complex use cases show some 
strong limitations that are largely related to its desire to mimic 
XForms. And I'm not sure these limitations can be removed without 
diverging largely from the XForms approach.

These limitations were obviously taken into account early in Woody's 
design, which make it stronger at handling data formatting and enforcing 
semantic constraints. But Woody, by over-separating concerns, is more 
heavy to use.

Considering all the pros and cons, I think Woody, which is still in its 
infancy, is more promising on the long term and should be promoted, once 
featured enough, as the preferred form handling package in Cocoon.

                           ---oOo---

Proposals
---------

We've seen that Woody requires to separate form definition from form 
template. I think (Bruno, correct me if I'm wrong) this constraint comes 
from the fact that the form _is_ the model, and thus must be filled with 
data _before_ being processed by the form template.

The ongoing work on form binding considers binding as a process 
surrounding form population and validation : the application->form 
binding fills an existing form, and the form->application binding 
transfers form data to the application model once the form is correctly 
validated.

Now we can imagine to have a "live" application->form binding occuring 
at form definition time which could allow simultaneous building of the 
form definition and population of form data from the binding. This 
feature could remove the need for a separate form definition and could 
be implemented by a WoodyTemplateGenerator taking as input a template 
file containing field definitions. A kind of "definition by example" 
(like the QBE that exists in Excel and various database systems).

This "defining-template" would only define fields and not datatypes. 
These datatypes could be either inferred from the application model 
trough the binding or fetched from a separate schema file (the current 
form definition, with only datatypes definitions).

On the other hand, form->application binding cannot be live, since we 
must ensure that all submitted value are valid before modifying the 
application data.

                           ---oOo---


Thanks for reading so far. As I expect this post to generate lots of 
discussions, I suggest to create separate threads for particular 
subjects (particularily the final "proposals" chapter) in order to keep 
the discussion focused.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Mime
View raw message