cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc Portier" <>
Subject RE: A Transformer in progress....
Date Fri, 10 May 2002 07:42:25 GMT
> > I'm working on a Transformer that processes specifically text nodes and
> > using regular expressions, wraps matched portions of a node in
> a tag. I'm
> > really just getting started on it - I have the basics working, but still
> > need to be able to specify the rules in an external file, etc.
> It has been
> > an interesting excercise so far, and the intent is to be able to detect
> > things like dates, currency amounts, and units of measure in a
> text node,
> > and mark them for later processing.
> >
> > I am planning on allowing rules to be specified in an external file
> > identified at componenet configuration, or directly in the component
> > configuration. I am also planning on allowing the "replacement" to be a
> > complete fragement with groups from the matched expression referencable
> > (and replaced) in etiher attribute values or text nodes. (currently I
> > merely enclose the match in a tag).
> >
> > Before I move on, has anyone else already done something like this? Does
> > anyone (other than me) think it would be useful?
> >

did something similar around january, after that the interest/needs kinda
so I didn't continu on it since, I still plan on taking it up round the
summer or so
(if you want I can make my current stuff available, and join in some

the biggest difference however is that you're assuming input has good but
too little markup
(so you go for a transformer)

while we were scratching the itch of pure text input and/or bad markup
(like HTML that jTidy can't handle, or even now when there is nekoHTML:
whenever the
 regex approach is easier then the XSLT afterwards cause of the mess in the
(so we chose a generator as lifeform)

at the time we started we colided with some joint thinking activity about
regex kind of support inside XSLT2.0, see for some of those discussions:

must say, I didn't follow the further development of xslt2.0 since, so maybe
someone else
could comment on any future for this kind of stuff inside the spec (and thus
impls like
xalan or saxon)

> > to my real question...
> >
> > Does anyone know of existing code that I can use to track and
> identify if
> > the current point in the SAX stream matches a simplified XPath
> expression?
> > I would really like to apply expression rule set based on an
> XPath subset.

Elas, don't know of such a beast (would be nice though)

So you got me triggered about thinking about this :-)
Can we define 'Simplified' xpath ?
SAX gives you a timely snapshot of current position in the XML file
so you would easily be able to have some kind of match for a simple
hierarchy of elements (wouldn't even need to be root based I guess,
even slide in some attribute tests should be possible, also position
evaluation as long as it's not =last()...  )

SAX just doesn't allow to look into the future so a lot of xpath you
will not be able to do.

> >
> > Any comments or suggestions?

You'll have to take a trade-off between how far you can cripple xpath to
support your needs, yet still can beat the maybe awkward but not totally
approch of building a temp DOM tree internally in your cocoon transformer to
allow real xpath stuff upon? (after all something similar happens to some
inside the xslt process he)

> >
> > Thanks!

please keep us posted of your findings and progress

To unsubscribe, e-mail:
For additional commands, email:

View raw message