xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arménio Augusto Arvins Pereira Pinto <ArmenioPi...@netcabo.pt>
Subject HTML parsing with Xerces
Date Wed, 26 Jun 2002 22:20:34 GMT
Hi there,

     This may be a kind of dumb question (!), but I need to parser an HTML
(not XHTML) with Xerces. I used the SAXParser but it gave me some fatal
erros in the following header:

!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN""http://www.w3.org/TR/REC-html40/loose.dtd">

I know the problem is with the "", right after //EN, without the necessary
whitespace... But this HTML document is generated by JavaDoc, and the header
is present in all the Java2 API Documentation from Sun!

Am I missing something? Is there a "right" way to parse HTML document
(event-driver, like SAX) in Xerces? Thanks in advance,

Arménio Pinto


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Mime
View raw message