ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kev Jackson <kevin.jack...@it.fts-vn.com>
Subject Re: sandbox gendoc
Date Mon, 16 Jan 2006 04:09:18 GMT

>Java does regex just fine, albeit more verbose (when is Java not verbose ;-),
>but my main point is that you already have (Java) tools allow you to
>have an XML view of the existing HTML manual (tagsoup, etc...). Leave
>the parsing to these tools, and concentrate of transforming the
>"loose" HTML schema into a more structured XML, probably using XSL as
>the language rather than scripting. By adding a little more structure
>to the HTML with <div>s, the XML view of the HTML could be complete
>enough for robust transformation to XML, and perhaps even robust
>enough so that the HTML remains as the official "source" document of
>the manual (but stripped of all formatting, which would be added later
>in the XML processing pipeline). The main advantage of this would be
>that editing HTML using an HTML editor for manual editing can be
>easier/nicer and kinda wysiwyg, compared to editing the transformed
>XML.
>
>  
>
I'm using libraries - I'm not writing my own html tokenizer :).


>>I'm aiming for a proof of concept script (for echo task) sometime in
>>the next week (if work doesn't get in the way too much).  After that
>>I'll see how easy a refactoring job will be for making it generic.
>>    
>>
>
>>From above, you can see that I envision the possibility of the HTML
>manual to remain, so it's all the more important that the transform is
>robust.
>
>  
>
This suggests that the HTML manual is the "one true source" for the 
manual, and all other versions are derived from it through some processing.

>Talks about the tokenizer being too greedy make me uneasy ;-) Leave
>the parsing to existing parsing tool, and just manipulate the
>structure of the document once it's been "reformatted" to a SAX event
>stream. In this form, it feeds easily and naturally to an XSL
>transform pipeline.
>
>That's my view of the whole thing anyway ;-) --DD
>
>  
>
 From this discussion my understanding is:

1 - Better to use Java + libs - presumably so that an Ant task can be 
derived from it (Ant creates it's own manual would be rather nice I'd 
have to say)
2 - Conversion util must be robust
3 - Conversion util will be long-lived
4 - Modification of existing HTML to make it easier for conversion util 
would probably be a good idea
5 - Structure of XML is as yet undecided - Docbook with RelaxNG has now 
been suggested as an alternative to a bespoke xml

Kev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Mime
View raw message