lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <frasche...@gmail.com>
Subject a good html & script parser
Date Sat, 25 Sep 2004 06:23:07 GMT
most of the html parsers I can find on the web handle only the <tag>
syntax and forget about the { code } syntax that usually occurs in a
lot of web pages.

Is there a good library to return the plain text of a html doc string
which will eliminate more than simply the <tag> occurrance?

-- 
___________________________________________________
Chris Fraschetti
e fraschetti@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message