incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <m...@leosimons.com>
Subject Re: [docs] whitespace and indenting of xml
Date Wed, 01 Feb 2006 09:00:28 GMT
How to clean up HTML...

Using shell with xmllint (yes, ugly shortcuts below):

  export cmd="xmllint --html"
  find . -name '*.html' -exec $cmd \{\} > \{\}.new \;
  find . -name '*.html' -exec cp \{\}.new \{\} \;
  svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm

(--html not totally necessary if you have valid XML, eg you
can format xml as follows:


  export cmd="xmllint"
  find . -name '*.xml' -exec $cmd \{\} > \{\}.new \;
  find . -name '*.xml' -exec cp \{\}.new \{\} \;
  svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm
)

Using shell with tidy:

  export cmd="tidy -m -i -c -e"
  find . -name '*.html' -exec $cmd \{\} \;

In ant you would create a <fileset> and then do an <exec> of
much the same.

Both tools have some more interesting options.

- LSD

On Wed, Feb 01, 2006 at 09:29:56AM +1100, David Crossley wrote:
> Martin Sebor wrote:
> > 
> > I'm a little distressed to see the conversion process has messed
> > up the formatting of the original HTML that I manually maintained
> > for readability. Specifically, many of the terminating tags (such
> > as </p>) are not indented as they ought to be and instead are in
> > column 1. I don't suppose there is an easy way to regenerate the
> > page so as to preserve more of the original formatting, is there?
> 
> I tried my best to format stuff automatically
> as part of the Forrest output process. If it
> was raw xml serialiser output then it would have
> been even worse. No we cannot retain original
> formatting.
> 
> I know that it is not good enough.
> 
> Someone could run all documents through something
> like HTML Tidy or Henning's CodeWrestler or perhaps
> some XSL.
> 
> I would be pleased to see how they do this, because
> i want to add the ability to our future tools.
> 
> On many projects i have seen messy source documents
> cause grief with svn diffs - too much clutter and
> inconsistent whitespace.
> 
> -David
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message