camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Strachan" <james.strac...@gmail.com>
Subject Re: TagSoup as a dataFormat and as a Component
Date Wed, 10 Dec 2008 10:12:33 GMT
2008/12/10 Ramon Buckland <ramon@thebuckland.com>:
> Hi Peoples,
>
> I am just about finished the proof of concept of using TagSoup as a
> DataFormat and as a component.
>
> For those not familiar with TagSoup, it is a Java Library (APache 2.0
> License) which converts poorly formatted Html
>
> <html> <p> something
>
> into well formed (xml) HTML. (not XHTML).
>
> ie:
>
> <html>
>    <body>
>            <p>something</p>
>    </body>
> </html>
>
> This is very helpful for a following reason.
>
>  <camelContext xmlns="http://activemq.apache.org/camel/schema/spring">
>  <route>
>    <from uri="direct:start"/>
>    <to uri="http://myserver.com/somequery?foo=1"/>
>    <unmarshal><wellFormedHtml/><unmarshal>
>    <to uri="xslt:file:///foo/bar.xsl"/>
>    <to .../>
>  </route>
> </camelContext>
>
>
> Questions:
>    Is this component helpful ? *Should I finish, I have not seen anything
> like it in the toolkit yet)

Definitely! Being able to format HTML nicely as XML so you can do
XPath and whatnot is *very* useful!


>    *If continuing is a good idea, what should the "dataFormat" be called ?
> ie the <wellFormedHtml/>

Oooh thats a tricky one - naming is so hard! Maybe <tagSoup/> ? We
might one day have a few different mechanisms? (e.g. jtidy?).

Though maybe tagSoup is a bit vague :). How about tidyHtml or tidyMarkup?


>    Am I unmarshalling or marshalling ? (we of course won't support going
> the other way as good to bad html is just hard(er))
>    I figured it is <unmarshalling> as the <csv/> dataformat is similar, CSV
> --> List<..> is ummarshalling.

Yeah. Whats the output btw - is it a DOM? Or can it be converted to a
Source so the endpoint could take DOM/SAX/StaX etc?


-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://fusesource.com/

Mime
View raw message