forrest-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian M Dube <brian.d...@gmail.com>
Subject Re: Conversion of "raw" html stops in mid-file w/o error message
Date Sun, 23 Oct 2005 04:20:03 GMT
On 10/22/05, Moshe Yudkowsky <msha4_05@bl.com> wrote:
> I've got a raw html file that is being auto-converted -- decorated --
> by forrest.
>
> Although the conversion goes well for the initial sections, at one point
> the conversion stops, and the rest of the file does not appear. There
> are no error messages or warnings.
>
> I have validated the document using the W3C validator, and it passes
> whether I use it as 4.01 loose or XHTML strict. (The meta tags have to
> be modified, depending  on the format, but the rest of the document is
> unchanged.)
>
> Problem 1: no conversion of XHTML strict
>
> If the document is XHTML strict, then forrest does not convert any of
> the body text whatsoever!
>
>
> Problem 2: partial conversion of HTML 4.01 text.
>
> The initial paragraphs convert with no problem. They look like this:
>
> <h2>WORK EXPERIENCE</h2>
> <dl>
>   <dt>
>    DIALOGIC &amp; INTEL CORPORATION / 1996 - 2002<br/>
>    1996 - 2002: Speech Technology<br/>
>    Mission: Architect and Advocate for Speech Technologies.<br/>
>    <em>(Note: Dialogic was acquired by Intel in 1999.)</em><br/>
> </dt>
> <dd>
>   <ul>
>    <li>Guide technical development... </li>
>   </ul>
> </dd>
> </dt>
>
> etc.
>
> The paragraphs which do not convert look like this:
>
> <h2>SKILLS</h2>
>         <h4>Speech Recognition &amp; Speech Technology</h4>
>          <ul>
>           <li>Cross-industry knowledge...</li>
>          </ul>
>
> The only line that converts is the <h4> line, SKILLS, and the rest of
> the document is missing. I thought the "&amp;" in the <h4> might be
> throwing the system off, I tried removing it, and that's not the problem.
>
> If anyone has any ideas on how to debug this, please let me know!

What if you try making the <h4> line <h3>? I could be off base (I'm
still getting to know Forrest), but there could be a problem with
parsing if you skip levels from <h2> to <h4>.

Brian

Mime
View raw message