xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harald Hett <h.h...@gis-systemhaus.de>
Subject Re: HTML Parser Update Available
Date Wed, 10 Apr 2002 16:14:42 GMT
> 2) A property, for example:
>   "http://cyberneko.org/html/names/modify"  { "upper", "lower",
> "default" }
> [These are just examples. I might want to modify the names.]
A property would be great, but with "no" instead of "default".

> > Is it planned to include NekoHTML into the Xerces release?
> There hasn't been an overwelming demand for it. Although, a
> few people responded and said it would generally be a "good"
> thing to include with Xerces. Certainly if there is a need
> for it, I wouldn't mind rolling it into the codebase -- it's
> actually quite small so it wouldn't add a lot to the source
> or Jar file(s).
> What do people think?

Recently I searched the web for a HTML-Parser that is capable to parse
dirty HTML in any way. But all that I found did not really convince me.
Only JTidy seemed to fit. But none of those solutions produces a
DOM-tree in the end, that can be easily modified by using the DOM-Api or
a XSLT-Stylesheet. That is a nice and interesting feature of CyberNeko
and makes it interesting for a lot of programmers.

Unfortunately the link to CyberNeko is not well known in the public. I
only got notice of it by reading your recent postings in
general@xml.apache.org. I think it should be either included in the
Xerces distribution or made accessible from the xerces homepage.

Harald Hett <h.hett@gis-systemhaus.de>
Gesellschaft für integrierte Systemplanung

In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

View raw message