camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claus Ibsen (JIRA)" <>
Subject [jira] Commented: (CAMEL-1184) Add a new Dataformat - tidyMarkup - which allows us to unmarshal bad HTML to good (XML) Html.
Date Thu, 11 Dec 2008 16:27:05 GMT


Claus Ibsen commented on CAMEL-1184:

Ramon, thanks a lot for this great component. HTML is a mess to parse, so this come handy.

A few review from me:
- Consider using IllegalArgumentException for invalid configuration instead of CamelException
(not normally used for this)
- ObjectHelper.notNull(dataObjectType, "dataObjectType", this) should be added to unmarshal
so we know it's set
- javadoc state throws Exception but its not in the throws list
- inputStream.close() we normally ignore with a debug/warn log
- asNodeTidyMarkup when thrown the original exception the caused exception is missing, should
be added as a 2nd parameter
- javadoc for DataFormatClause the fluent builder methods is misleading for the one having
a String as the type. 
- TidyMarkupDataFormat uses java assert. Please throw a IllegalArgumentException instead

> Add a new Dataformat - tidyMarkup - which allows us to unmarshal bad HTML to good (XML)
> ---------------------------------------------------------------------------------------------
>                 Key: CAMEL-1184
>                 URL:
>             Project: Apache Camel
>          Issue Type: New Feature
>            Reporter: Ramon Buckland
>            Assignee: Ramon Buckland
>            Priority: Minor
>             Fix For: 2.0.0
>         Attachments: tidyMarkup-sourcefiles.tgz
>   Original Estimate: 4 hours
>  Remaining Estimate: 4 hours
> Using TagSoup, a competent 'bad html' to good well formed (xml) Html, we can create a
new dataformat such that ..
>    from("direct:fromSomeHttpSite")
>        .unmarshal().tidyMarkup()
>        .setBody().xpath("//table/tr/td[1]")
>    .to("direct:foo")
> we get to turn the nasty HTML into goody HTML which can go through XSLT components and
be xpathed and all the goodness we love.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message