forrest-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <>
Subject using ihtml and xhtml as input (Was: Using <!ENTITY> or XIncludes)
Date Sat, 07 Aug 2004 05:54:16 GMT
Charles Palmer wrote:
> (Note - we are now discussing the treatment of HTML, rather than
> XIncludes...)

The mail subject is now changed to reflect that. We can go
back to the discussion of using entity and/or xinclude and
entity later.

Using the full capabilities of XHTML as one of the default
Forrest input types is scheduled for the next version of
Forrest, i.e. not the upcoming 0.6 but the next.

Until then Forrest treats both HTML and XHTML as tag soup.
As the name implies, the .ihtml is interpreted html.

> I created the three types of XHTML file that XMLMinds allows ("Div",
> "Strict" and "Transitional") - added only trivial content, checked their
> validity with XMLMind's in-built checker,

Good. Treat validation as a separate concern. I mean if one
knows that the test data is correct then we can look for
other causes.

> saved them as the ".ihtml" file
> type and then processed them all with Forrest.

Yes that is correct and it does work.

> Forrest reported no errors, but the HTML that was created for
> each file read: "Error in conversion.
> Warning This file is not in a html format, please convert manually."

Yes, not strictly HTML, so error. The XML declaration
makes it an xml document and the parser was expecting html.

The new version forrest-0.6-dev is more lenient and lets
you past this stage.

> The simplest file is just this:
> <?xml version="1.0" encoding="UTF-8"?>
> <div><p>XHTML Example 2</p><p>XMLMind refers to this as "Div
(part of a
> modular document)"</p></div>
> Another is this:
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "">
> <html>
>   <head>
>     <title>XHTML Example 1</title>
>   </head>
>   <body>
>     <p>XMLMind refers to this as "XHTML Page (DTD XHTML 1.0 Strict)"</p>
>   </body>
> </html>
> Removing the line starting <?xml version... removed the error messages in
> all three cases,

Well Forrest-0.5.1 is expecting to see HTML, so removing that
declaration makes it one step closer to that.

And i will note something else strange. Simply commenting out
that XML declaration line was not enough. It had to be
totally removed.

Not so in version 0.6-dev ... you can even leave the xml declaration
(but still it is intentionally seen as HTML (i.e. anything goes). 

> but gave me less content than I expected in the rendered
> HTML - only the <title> text, and not the <p> text. (The pdf file contains
> the <title> text and on another line "<!-- -->").  (I now see that my
> earlier VATReport example did not render as expected either).

Now we are getting into a different class of issues.
Forrest is accepting that as HTML now, but there are
some peculiarities.

The ihtml is being interpreted by Forrest and transformed to 
the intermediate Apache xdocs document structure. That stylesheet
cannot deal with every possibility in unstructured html, so it
tries to guess how to build <section> and such. It needs <h1> etc.
style headings in the source ihtml (and the page must start with
one of those). Patches are welcome to enhance that transformer.

I can get both your examples to work in forrest-0.5.1 by adding
an <h1> at the beginning.

> So I would be tempted to a preliminary conclusion that there are a few
> things broken with the rendering of html files.

As discussed, html files processed via .ihtml are fine.
For xhtml our version forrest-0.5.1 has some known issues.
In forrest-0.6-dev you can use either html or xhtml.

> Can I forward anyone my source files to see if you can
> reproduce my results?

I used the examples that you provided above to do testing
and confirm your issues.

Considering that you are new to Forrest, i strongly suggest
that you move to a snapshot release of the current development

David Crossley

View raw message