xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaquiss, Robert" <RJaqu...@nfb.org>
Subject Looking for tools/ideas for filtering HTML
Date Fri, 16 Nov 2001 20:43:37 GMT
     I have just joined this list, and am also a beginning Java
programmer. I appologize if this is not a suitable question for this
list. I need to write a filter for HTML pages. My goal is to read an
HTML page, throwing away all the HTML code and just keeping a block of
text that occurs near the bottom of the page. The HTML tags are liable
to be unbalanced. There will be a <P> but no </P>. I found a sample
program that used the SAXparser, but it SAXparser doesn't seem to handle
unbalanced tags. Ideas/comments would be appreciated.  Thank you.
   Robert Jaquiss

View raw message