abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Parsing HTML
Date Thu, 04 Oct 2007 22:04:28 GMT
I have a HTML helper utility implemented that can be used to parse HTML
into an Abdera Div object.  This can be used, for instance, to convert
an HTML title to XHTML, etc.  I can check it in to Abdera as an
extension module.  It is based on a subset of the Validator.nu HTML
Parser (http://about.validator.nu/htmlparser/), which is released under
the Apache license.  To avoid external dependencies (e.g. XOM, ICU, etc)
I have created a subset of the Validator.nu parser that removes the
stuff we're not using.  I would be checking the code in directly to our
svn.  Again.. the code is released under the apache license.

Example use:

  Abdera abdera = new Abdera();

  String s = "<p lang='en'>bob&trade;";
  StringReader sr = new StringReader(s);

  Div div = parse(abdera,sr);
  System.out.println(div);

Outputs:

  <div xmlns="http://www.w3.org/1999/xhtml"><p lang="en">bobâ„¢</p></div>

Should I go ahead and check it in?

- James



Mime
View raw message