xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KUMAR,PANKAJ (HP-Cupertino,ex1)" <pankaj_ku...@hp.com>
Subject RE: NekoHTML Parser License Change
Date Mon, 25 Feb 2002 17:40:46 GMT

Robert Koberg:
> 
> I am thinking of trying to target a website and crawl through 
> the pages,
> transform it into XML (as much as possible...) and deposit it 
> somewhere.
> 
If you just want a collection of HTML pages to start with, you can get a
bunch from google. They have announced a programming contest and are
providing a repository of HTML pages as dataset. Look at
http://www.google.com/programming-contest/ for details.

/Pankaj.

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message