commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Zeigermann <oliver.zeigerm...@gmail.com>
Subject Re: [xmlio] comparison with Digester
Date Mon, 11 Oct 2004 07:00:51 GMT
On Mon, 11 Oct 2004 19:42:37 +1300, Simon Kitching
<simon@ecnetwork.co.nz> wrote:
> On Mon, 2004-10-11 at 19:12, Oliver Zeigermann wrote:
> > Hi Simon,
> >
> > I see you have put some energy on feedback! Thanks for that :)
> 
> Aargh! Our emails are crossing each other somewhere out in cyberspace
> :-).
> 
> Actually, probably somewhere near Delhi, being roughly midway between
> Germany and New Zealand...

Well, right. This will be my last post for a while until all loose
threads have been taken care of...

> > My understand: xmlio just goes with the callback, Digester creates
> > objects. This is a difference in interface as well as in performace,
> > right?
> 
> Well, not really. Digester rules have callbacks. It's just that if you
> choose to have a prebuilt ObjectCreateRule handle the callback then an
> object gets created. But you can handle any callback yourself.
> 
> >
> > > (b)
> > > A complete path to the current element is passed to the "startElement"
> > > method.
> > >
> > > Digester has the "getMatch" method which can be called by any rule to
> > > get the path to the current element. Xmlio does provide a SimplePath
> > > instance instead of a plain string to represent this path (equivalent to
> > > the File class wrapping a filename). However in Digester you don't
> > > really need anything more complex than a string because you don't
> > > normally do computations on paths anyway - you leave that up to the
> > > "rule matcher" class.
> >
> > And a hierarchy of objects representing the XML structure, right?
> 
> Entirely optional. One item on my to-do list is to create a Digester
> example that processes an xml document representing a database, eg
>  <row>
>   <column name="name">Linus</column>
>   <column name="name">Linus</column>
>  </row>
> and fire off SQL insert statements without building a model of the
> input.
> 
> A digester-based application can handle sax events "as they happen".
> It's just that the common use is to get these events to trigger the
> creation of objects and setting of properties.

If it needs an example it is probably complicated, right ;)

> 
> >
> > > (c)
> > > The xmlio concept of having a callback method invoked at element end
> > > which passes both the element text and the element attributes is mildly
> > > useful (but calling this method "startElement" is rather confusing IMO).
> > > It would certainly be possible to add this feature to Digester/Digester2
> > > (though it does have a minor performance drawback). With the current
> > > digester code, you can clone the attrs and push them on a (named) stack
> > > in begin() and then fetch them back in body() to get the same effect.
> >
> > (1) Why do you think it is mildly useful only? My experience is stuff
> > similar to this occurs all the time
> 
> Sorry, I should have been clearer.
> 
> I agree the data is useful. I'm simply saying that the attribute
> information is already available via the startElement callback, and the
> character data available via the characters() callback; xmlio is just
> saving the user the effort of saving that info somewhere until the
> endElement callback. Nice, but not complex - that's all I meant.
> 

That really is the philosophy - as noted on the other post I guess,
loosing track of it all right now - have all associated parts of code
and data at the same position.

> >
> > <parameter name="olli">xmlio</parameter>
> >
> > which you then get with a single callback. Besides calling such a
> > method startElement might indeed be misleading. Better ideas?
> 
> How about "completeElement"? "elementCompleted"?
> 
> Even "endElement"..that is the SAX event that actually triggers the
> xmlio overloaded "startElement" call isn't it?

element might be the best name I guess, as a complete element is described...

> >
> > Anyway, the above does not work in mixed content only, i.e. tags mixed
> > with text which usually is the case with flow text only. Flow text
> > then hardly needs detailed and special treatment by xmlio or Digster
> > then. Do you have other examples where mixed content occurs and would
> > need a detailed treatment?
> 
> Well, if you have:
>  <site>
>   <article author="simon">
>     <priority>high</priority>
>     This is the article text
>   </article>
>  </site>
> then by delaying the call to "startElement" until the </article> tag, it
> is very difficult to deal with the <priority> tag. It really should
> operate on the article, but the article tag hasn't been "processed" yet.
> Presumably you'll get startElement callbacks with the following paths in
> the following order:
>  site/article/priority
>  site/article
>  site

This would be bad XML design as the DTD would have to look like

<!ELEMENT article (priority, PCDATA)* >

which would allow something like this as well

<article author="simon">
    <priority>high</priority>
    This is the article text
    <priority>low</priority>
    What is this?
</article>

Which certainly is not what you wanted. So this should rarely happen.
I guess it should rather look like

<article author="simon">
    <priority>high</priority>
     <text>This is the article text</text>
</article>

with this DTD

<!ELEMENT article (priority, text) >

Which then would be no problem for such a call back. That's why it is there...

> >
> > (2) xmlio was build for simplicity and transparent use. No funky
> > details in the background, no surprises, all obvious. I am more than
> > convinced all this can be done with Digester as well, but maybe not
> > this simple and obvious and easy to learn and do. E.g. you will rarely
> > need to maintain any additional stacks in xmlio, at least not for
> > that.
> 
> Unless you're processing xml that is absolutely "flat" then how do you
> write the SimpleImportHandler methods?
> 
> Doesn't the user of the library immediately have to declare their own
> stack objects to represent the innate nested structure of xml input?
> 
> I'm truly curious about what useful content-handler code can be written
> without using stacks...

The path is sort of a stack. If it is /root/element1 you know you are
processing element1 inside of the root element. No need for a stack
here. The path passed is not a string, but has methods that allow for
the appropriate checks.

> >
> > >
> > >
> > > Regarding the "out" part of the xmlio libs: this is basically a
> > > collection of static functions doing simple but useful xml string
> > > encoding etc., and a stream class that does auto-indenting. Digester
> > > certainly doesn't have anything like this. This code does feel like it
> > > might be at home in "lang" or "codec"...
> >
> > Plus pushing XML into byte streams. Besides there are quite some
> > pieces of code lying around in Jakarta doing similar stuff. Maybe we
> > could take of the as well...
> >
> > > Oliver, if there was a "digester2" project which provided a "basic" jar
> > > that was pretty light-weight and had only optional dependencies on
> > > commons-beanutils and on commons-logging, might you consider using that
> > > in i18n (or even Slide) instead of the xmlio code? (And would you be
> > > interested in helping to create digester2??).
> >
> > Can't speak for i18n, but if what you have then is fine, why not using it...
> 
> Yep, I can understand that. I'm certainly not urging that Slide
> immediately convert to using Digester!
> 
> The sandbox is for playing with ideas, and I'm very glad I got the
> chance to see xmlio and learn about Slide's use of it.

Again, thanks for providing your input and sharing experience.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message