xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [Vote] Add NekoHTML to Xerces
Date Mon, 15 Apr 2002 10:18:11 GMT
neilg@ca.ibm.com wrote:
> Hi Andy and all,
> I'm cc'ing xerces-j-dev on this one; I didn't see this note go there
> originally and if it's a vote of Xerces-J folks that's wanted, that list
> should probably know.  :-)
> There's no doubt of the need for a robust HTML parser.  But as the
> questions we get asking why an XML parser can't handle some given HTML doc
> show, there does seem to be some danger of confusing people with respect to
> how XML and HTML differ.

Good point.

> But NekoHTML is pretty clearly an application of XNI, and intimately uses
> other components of Xerces.  So I think the idea of making it a subproject
> of Xerces serves the need best--but I'd like to see it as separate as
> possible; something like
>      xml-xerces/java/neko

Makes sense to me.
> complete with its own build file, its own docs, src directory, packaging
> scheme etc.  Obviously it would reference Xerces code, but I don't want to
> put a task in the Xerces build file for this subproject because there are
> already a ton of tasks here; I'd even be in favour of this subproject
> having its own releases that might not necessarily match those of the
> parent project.

That's what I was proposing earlier. As I told Andy, you might want to
take a look at Avalon and Turbine under jakarta to see how they handle
their subprojects.

They also have separate CVS repositories, althought I would suggest you
to keep everything in one repository for Xerces (as you proposed).

> If folks are made uncomfortable by the fact that NekoHTML gets this
> isolation while the WML/HTML DOM implementations, as much modules to Xerces
> as NekoHTML, are currently treated as integral parts, perhaps we could turn
> this subproject into an "accessories" subproject and make all three
> components live there.    Or treat all three as their own subprojects; I
> don't have an opinion in this direction.

I think it would be a great feature. I would love to see the beauty of
the XNI design reflected by the packaging, also because a parser is
*always* embedded in some other application and the embedder has the
knowledge to know what 'functionalities' he/she is going to need from
Xerces. Having a 'toolbox' of parsing components that can be
componentized at need, it would definately kill Crimson because the only
point of using it was its small size.
> Does that make any sense?
> BTW, someone--Stefano I think--floated the idea of an ant task to build a
> minimalist Xerces.  We've already gone some way towards that:  if you build
> the "tinyjars" target (sorry for the name; I'm not good at that.  :-))
> you'll get a xercesImpl that doesn't have schema support or the HTML/WML
> DOM, and is only about 430K in size.  Taking out DOM support would be an
> interesting exercise, one I think would be quite feasible.

As I said, I would really love to be able to embed in my application
only those components that I need for XML parsing. In Cocoon case they
would be a lot (XML SAX parsing, DTD validation, Schema validation, XML
DOM support and HTML parsing) but that would left out some components
(mostly WML/HTML DOM support) which nobody really uses in our
So, you could have something like

 xerces-core.jar -> the XNI interfaces and internal machineries
 xerces-sax.jar -> the XML sax parser
 xerces-dom.jar -> the XML DOM provider
 xerces-dtd.jar -> the DTD validator
 xerces-schema.jar -> the XMLSchema validator
 xerces-neko.jar -> the HTML parser
 xerces-html-dom.jar -> the HTML DOM provider
 xerces-wml-dom.jar -> the WML DOM provider

and (in the same dist or in another one)

 xerces.jar -> which includes all of them. (for those who really don't
care about size)

and no compression added to the jars to speedup classloading
(compression can be performed over the wire if required)

What do you think?

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche

In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

View raw message