cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <>
Subject Re: Handling lousy HTML
Date Fri, 06 Sep 2002 14:40:15 GMT

Ola Berg wrote:
> From: "Nicola Ken Barozzi" <>
>>HTMLGenerator uses JTidy directly, without making assumptions itself.
>>If you can use JTidy to work for you, it should work - or can be easily 
>>made to work - with HTMLGenerator too.
> What do you mean? I can use JTidy on my system, whether Cocoon utilizes or not was my
question to you, dear community ;-)

I meant if you can make it work from the commandline to generate the 
result you want, then also Cocoon can do it.

> Therefore I provided both the sitemap snippet as well as the test bhtml-document.
> I use the binary distribution of Cocoon 2.0.2 (where documentation says that this feature
is enabled by default). And if it is not enabled by default, I haven't been able to find out
how to enable it. 
> Question restated: given my configuration and the bhtml document that fails, is it safe
to believe that HTMLGenerator utilizes JTidy and that JTidy fails, or is it safe to believe
that HTMLGenerator fails because it fails to utilize JTidy? 

I don't know, that'e why I made you that question.
USe JTidy outside of Cocoon to see if it works.
If it does, tell us how you did it, and we will patch the Cocoon 
HTMLGenerator to play nice.

> And if the latter is true, how could I tweak it so that JTidy will be utilized by HTMLGenerator?

This is what HTMLGenerator does

             // Setup an instance of Tidy.
             Tidy tidy = new Tidy();
             //Set Jtidy warnings on-off
             //Set Jtidy final result summary on-off
             //Set Jtidy infos to a String (will be logged) instead of 
             StringWriter stringWriter = new StringWriter();
             PrintWriter errorWriter = new PrintWriter(stringWriter);

             // Extract the document using JTidy and stream it.
             org.w3c.dom.Document doc = tidy.parseDOM(new 
BufferedInputStream(this.inputSource.getInputStream()), null);

If you know how to make JTidy output as you need, tell us and we will 
path the HTMLGenerator.

> If the first is true ("HTMLGenerator can't handle the bhtml-snippet no matter what")
I really need to investigate another solution, such as:
>>Look here, maybe it's the right time to ditch tidy entirely
> ...sounds promising. I'll try to download and investigate. Hopefully I can provide a
CleaningHtmlGenerator soon, if it is needed.

Cool :-)

>>>BTW: the example I provided is actually cleaner than much of the code I need Cocoon
to deal with.
> I could provide a list of testsnippets that the tidying thing should handle, fx:
> ---
> <h1>Hello <p>How do you do 
> <table border="2 >thing1<td>thing2</table>
> Wondering<p>foo <b>bar <i>baz</b> garply</i>"
> --- should become something like ---
> <html>
> <head>
> </head>
> <body>
> <h1>Hello</h1>
> <p>How do you do
> </p>
> <table border="2">
> <tr><td>thing</td><td>thing2</td></tr>
> </table>
> <p>Wondering
> </p>
> <p>foo <b>bar <i>baz</i></b> <i>garply</i>
> </p>
> </body>
> </html>
> ---

I tried it in the C version og Tidy, this is what I got:

<p>How do you do
<table border="2 &gt;thing1&lt;td&gt;thing2&lt;/table&gt; 
Wondering&lt;p&gt;foo &lt;b&gt;bar &lt;i&gt;baz&lt;/b&gt;

Maybe changing the rules..

Nicola Ken Barozzi         
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)

Please check that your question  has not already been answered in the
FAQ before posting.     <>

To unsubscribe, e-mail:     <>
For additional commands, e-mail:   <>

View raw message