cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hunsberger, Peter" <Peter.Hunsber...@stjude.org>
Subject RE: Sitemap validation
Date Mon, 10 Mar 2003 18:37:16 GMT
Noticed this on xml-dev today and thought it was somewhat relevant to the
dicussion about sitemap validation. In essence; validate once, run many.  If
done this way you wouldn't even need to validate the sitemap each time
Cocoon started up, only if the digest changed...

Peter Hunsberger
 
-----Original Message-----
From: Niels Peter Strandberg [mailto:nielspeter@npstrandberg.com] 
Sent: Monday, March 10, 2003 12:15 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Let the publisher validate the xml and the make a msg
digest


Let the publisher validate the xml and the make a msg digest

When an xml document is authored, the author can attach a xml schema or 
dtd reference  to it. The  receiver of the xml document gets the xml 
document and validates it against the xml schema or dtd, referenced in 
the document to verify that the document is valid.

The xml document might be used over and over again, without any changes 
is made to it, and it might even be validated every time. This is a 
waste of time!

Let the author do the validation of the finished xml document. If the 
xml document is successfully validated against the referenced xml 
schema or dtd, why should the receiver of the document need to check 
the document again to se if it is valid, the author has tested it 
already?

My suggestion is that after the document has been validated by the 
author, an message digest is created, similar to ones used in 
cryptography, and the digest value is appended to the xml document.

All the receiver has to do is run the xml document through the same 
msg. digest, and compare the results of the 2. If they are equal, 
nothing in the document has changed since the author made the digest, 
so no need to validate.

So this brings you not only conformation that the document is valid, 
but also that its content has not changed.

This also allows dom builders (if they are changed) to skip the process 
of verifying that the data it receives from the sax reader is really a 
xml character, well-formed etc, since that also brings a lot of 
overhead. Just look at jdom when it builds a jdom document.

Example:
           <?xml version="1.0"?>
           <Family>
                     <Person>
                               <Name>Fred Flintstone</Name>
                     </Person>
                     <Person>
                               <Name>Vilma Flintstone</Name>
                     </Person>
           </Family>

When I run this through openssl and makes a message digest, with the 
command:  "openssl dgst flintstone.xml"
it returns a digest: "b99060bb744edd6aac5193da6957afcb" (the problem 
with this digest is that white space is also included!)

Then we can do something like this:

           <?xml version="1.0"?>
           <?digest="b99060bb744edd6aac5193da6957afcb"?> // or 
whatever!!!!
           <Family>
                     <Person>
                               <Name>Fred Flintstone</Name>
                     </Person>
                     <Person>
                               <Name>Vilma Flintstone</Name>
                     </Person>
           </Family>

The receiver can then read and remove the digest, and the verify it 
using the same msg digest using the same command showed before.

It could be interesting to do some benchmarking on this.

This is just some thoughts!


Regards, Niels Peter Strandberg


-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an initiative
of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

Mime
View raw message