xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ne...@ca.ibm.com
Subject Re: [Vote] Add NekoHTML to Xerces
Date Fri, 12 Apr 2002 21:34:01 GMT
Hi Andy and all,

I'm cc'ing xerces-j-dev on this one; I didn't see this note go there
originally and if it's a vote of Xerces-J folks that's wanted, that list
should probably know.  :-)

There's no doubt of the need for a robust HTML parser.  But as the
questions we get asking why an XML parser can't handle some given HTML doc
show, there does seem to be some danger of confusing people with respect to
how XML and HTML differ.

But NekoHTML is pretty clearly an application of XNI, and intimately uses
other components of Xerces.  So I think the idea of making it a subproject
of Xerces serves the need best--but I'd like to see it as separate as
possible; something like

     xml-xerces/java/neko

complete with its own build file, its own docs, src directory, packaging
scheme etc.  Obviously it would reference Xerces code, but I don't want to
put a task in the Xerces build file for this subproject because there are
already a ton of tasks here; I'd even be in favour of this subproject
having its own releases that might not necessarily match those of the
parent project.

If folks are made uncomfortable by the fact that NekoHTML gets this
isolation while the WML/HTML DOM implementations, as much modules to Xerces
as NekoHTML, are currently treated as integral parts, perhaps we could turn
this subproject into an "accessories" subproject and make all three
components live there.    Or treat all three as their own subprojects; I
don't have an opinion in this direction.

Does that make any sense?

BTW, someone--Stefano I think--floated the idea of an ant task to build a
minimalist Xerces.  We've already gone some way towards that:  if you build
the "tinyjars" target (sorry for the name; I'm not good at that.  :-))
you'll get a xercesImpl that doesn't have schema support or the HTML/WML
DOM, and is only about 430K in size.  Taking out DOM support would be an
interesting exercise, one I think would be quite feasible.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com




                                                                                
                    Andy Clark                                                  
                    <andyc@apache.       To:     general@xml.apache.org         
                    org>                 cc:                                    
                                         Subject:     [Vote] Add NekoHTML to    
                    04/11/2002            Xerces                                
                    11:46 AM                                                    
                    Please respond                                              
                    to general                                                  
                                                                                
                                                                                



There is clearly a need for an HTML parser that can produce
standard XML APIs such as DOM trees and SAX events. My little
NekoHTML parser uses the Xerces Native Interface (XNI) to
implement this functionality and does a fair (but limited)
job at it. Since there's been interest in having this kind
of functionality in the parser package itself, I'm putting
it to a vote of the Xerces developers.

[Q] Should we add NekoHTML to the Xerces-J codebase?

--
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org





---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message