xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Koberg" <...@koberg.com>
Subject Re: NekoHTML Parser License Change
Date Mon, 25 Feb 2002 15:08:43 GMT
I am very interested in it too, but have not had time to work on it.

I am thinking of trying to target a website and crawl through the pages,
transform it into XML (as much as possible...) and deposit it somewhere.

Thanks for doing this Andy!

----- Original Message -----
From: <fred@ontosys.com>
To: <general@xml.apache.org>; <xerces-j-dev@xml.apache.org>;
<xerces-j-user@xml.apache.org>
Cc: "Andy Clark" <andyc@apache.org>
Sent: Monday, February 25, 2002 6:47 AM
Subject: Re: NekoHTML Parser License Change


> Andy's NekoHTML parser has worked well for me in a small project where
> I needed to scrape some data from a set of HTML pages.  With NekoHTML
> as the front end I was able to use an XSLT stylesheet to extract that
> data directly from the HTML pages.
>
> NekoHTML also allowed me to write a simple HTML transformation that I
> find useful when analyzing HTML page layouts:  adding a small colored
> border to each TABLE so that the table boundaries are visible.  This
> transformation requires only a few lines of XSLT added to a standard
> "identity" transformation.
>
> I expect that NekoHTML would make it easy to translate HTML code into
> XHTML format.  I have encountered a few tag-balancing glitches, where
> NekoHTML struggles to accommodate ill-formed HTML code much as the
> popular browsers do, but overall it has been very solid.
>
> NekoHTML is very easy to use.  For the most part it is a transparent
> addition to a standard Xerces/Xalan configuration, and all the usual
> APIs -- including JAXP -- seem to work as expected.
>
> Nice work Andy.  Thank you for making NekoHTML available.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message