cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Noels <stev...@outerthought.org>
Subject Re: validation of config during build (Was: Re: sitemap validation is broken)
Date Fri, 07 Mar 2003 12:22:58 GMT
Stefano Mazzocchi wrote:

> I'm more and more considering sitemap validation harmful.
> 
> why:
> 
> 1) the sitemap logic is too hard to be validated from any validation 
> language (it requires java runtime capabilitles)
> 
> 2) it reduces the effort of clean and meaningful error messages in the 
> treeprocessor

'Interesting' perspective, to say the least.

Some thoughts:

1) http://outerthought.net/downloads/sitemap.pdf and 
http://outerthought.net/downloads/sitemap_a4_poster.pdf

cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825 
  downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months 
preceding that period. And another 2500 for the poster version, brings 
us to a total of 975 downloads / month for Bruno's sitemap poster.

... which means there's a _vested_ interest in trying to understanding 
the sitemap, and people are even willing to look at some graphical 
depiction of it in order to understand.

2) In our experience, when we confront people with the sitemap, they are 
bewildered until we give them a copy of Pollo with the sitemap grammar 
loaded into it and some very basic customization 
(http://pollo.sourceforge.net/sitemap1.png). I assume the same happens 
when people see Sunbow. Needless to say, having 3 different grammars for 
the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is 
troublesome at best, so some rationalization is more then appropriate.

3) Some days ago when investigating 
http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered 
some way to 'address' a matched group of a matcher pattern when nesting 
matchers which I never heard of, and already forgot about it ATM. :-( I 
can say for myself that I do a reasonable effort in keeping up with 
new-things-Cocoon, but it was something I clearly missed. I'm pretty 
sure it is only 'documented in code' or on the mailing list somewhere.

> Example, try
> 
>  <generate uri="..."/>
> 
> where the uri attribute is not allowed in generate (shoulc be 'src'), 
> the treeprocessor totally ignores this and sends the empty string to the 
> parser, resulting in the error
> 
>  System ID not found!
> 
> Sitemap validation has stopped us from fixing the error messaging 
> capabilities on mistakes.

I don't parse this: in what way does the sitemap validation relieve 
somebody of the task of properly handling exceptions on the code level?

> I propose to blast the sitemap validation alltogether.

OK. I know I'm sounding harsh and I don't mean to: it's just one of 
these discussion I had so many times already in my own little company, 
being the only XML-head with two (much smarter) Java-heads. We had the 
same thing with the xReporter report grammar, which admittedly is only 
really handled and interpreted in Java code, yet our initial customer 
wanted to have a proper XML grammar for it.

Why that? For editing purposes. People want to use XML editors for 
editing the sitemap, and these tools _can_ provide proper guidance when 
configured with a grammar. I know we are heading towards your pet peeve 
discussion (*) of pre/post validation Infosets and the various ways each 
of the available grammars suck at grasping these concepts, but still I 
very much believe people will be grateful for anything (apart from 
Java(doc/code)) that guides them during the creation of an XML document, 
or at the least offers them some validation prior to loading the thing 
into Cocoon and see what Cocoon makes out of it.

(*) I must as this discussion is one of my favorite pet peeves, too ;-)

I agree there is a significant amount of overlap and various levels of 
underspecification for-the-sake-of-simplicity when having both some XML 
grammar and executable code which interpretes XML orthogonally to this 
grammar, but still I'm very much +1 for some reasonable quality XML 
grammar, if only to help out our users.

If not, why don't we just specify the sitemap in some own-cooked grammar 
like:

match pattern="news/**"
   match pattern="news/1999/**"
     generate src="oldcontent/news/{1}.html" type="html"
     transform src="styles/old2new.xsl"
   match pattern="news/20*/**"
     generate src="docs/news/20{1}/{2}.xml"
   transform src="news2html.xsl"
   serialize

Gee - I must have been reading too much Python code lately ;-)

Sorry if I sound offensive, I really don't mean to - but it's a personal 
pet peeve ;-)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org


Mime
View raw message