most of the html parsers I can find on the web handle only the <tag>
syntax and forget about the { code } syntax that usually occurs in a
lot of web pages.
Is there a good library to return the plain text of a html doc string
which will eliminate more than simply the <tag> occurrance?
--
___________________________________________________
Chris Fraschetti
e fraschetti@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|