xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Massari <amass...@progress.com>
Subject RE: exposing setXMLEntityResolver to the SAX parser
Date Thu, 08 Apr 2004 12:43:46 GMT
At 13.13 08/04/2004 +0100, Mark Weaver wrote:

> > Hi Mark,
> > I guess the reason these methods have not been added to SAX2XMLReader is
> > that this interface is derived from the SAX2 specs, where they
> > are missing.
> > If you can provide a patch using the EntityResolver2 extension,
> > that would
> > be a better fix for this problem.
> >
>I'm not really sure how to do this.  Last time, I got as far as:
>
>- Adding an EntityResolver2 class defined as per SAX2 spec
>- Adding a getEntityResolverVersion function to EntityResolver so as to be
>able to distinguish between the two at runtime

Maybe we could avoid this by adding an overloaded 
setEntityResolver(EntityResolver2*) that would automatically detect the new 
interface (and set the
http://xml.org/sax/features/use-entity-resolver2 feature)

>- Changing resolveEntity everywhere to take a name and baseURI parameter
>- Calling the appropriate resolver (via getEntityResolverVersion and
>static_cast)

The appropriate resolver should be invoked by looking at the 
use-entity-resolver2 feature (that the user could decide to turn off even 
if she provides an EntityResolver2 interface)


>Where I came unstuck was the name parameter:
>
>name - Identifies the external entity being resolved. Either "[dtd]" for the
>external subset, or a name starting with "%" to indicate a parameter entity,
>or else the name of a general entity. This is never null when invoked by a
>SAX2 parser.
>
>I haven't much of a clue how to do that.

It looks like the specs don't take into consideration XML Schema; what 
should "name" contain in that case?
If this resolver should be invoked only when a DTD entity is being 
resolver, the informations is almost there; the LastExtEntityInfo structure 
needs to be extended with a "name" field, that getLastExtEntityInfo would 
fill by using the XMLEntityDecl currently in scope.


>Now I see that we have the old style resolveEntity method, and a new one
>that takes an XMLResourceIdentifier.  XMLResourceIdentifier does not include
>a name, but it seems that these days the resolveEntity method is ignored
>(comments indicate that it is not called, but that the other one is instead,
>which seems true on a quick reading).  I guess the sensible approach would
>be:
>
>- Add a name member to XMLResourceIdentifier
>- Find out what to put in the name every time one of these is created
>- Have the new resolveEntity check for ER2 being installed and call the
>appropriate method passing on the name parameter
>
>Then I'm also left with the getExternalSubset() function, which again, I
>don't know where or how to implement.

It should probably be handled like the setExternalSchemaLocation is 
handled; when isRoot is true and the setting has been set, the resolver is 
invoked and the DTD is parsed like it was specified in the prolog.

Alberto


>There's also a problem with the ER2 vs ER method in that you are supposed to
>be able to turn off using ER2 via the SAX2 feature
>http://xml.org/sax/features/use-entity-resolver2.  Obviously the ER2 method
>then needs some way of knowing which kind of entity resolver it is meant to
>be, and the simple test above is not enough for that.  OTOMH you'd need to
>do something like storing pointers to
>
>a) an EntityResolver
>and
>b) an EntityResolver2
>
>everywhere.  setEntityResolver on the SAX2 interface can then check the
>feature flag and store the appropriate one (which would then pass off the
>appropriate ER/ER2 to the other scanners which are invoked, e.g. for the
>DTD/XSD, where each of these would need to have a slightly different
>interface featuring a setEntityResolver2.
>
>It's beginning to sound very messy (unless there is a better solution, which
>there may well be), as we now have three ways of installing three different
>kinds of incompatible entity resolvers.
>
>Simply exposing setXMLEntityResolver isn't actually enough for me: I'm using
>Xerces through Xalan (which I assume is fairly common), so then a chunk of
>Xalan (including interfaces) needs to change to expose this as well, so
>that's not ideal either.  With the old method (which admittedly was
>terminally broken wrt to the spec) but did the job for me, there were no
>external interface changes, and therefore no changes needed in Xalan, which
>is a plus.
>
>Any help as to which way to go would be appreciated.
>
>Thanks,
>
>Mark
>
> > Alberto
> >
> > At 13.19 07/04/2004 +0100, Mark Weaver wrote:
> > >Would this be permissible?  This is very useful, as the current
> > >EntityResolver interface does not provide a base URI, leading to
> > the problem
> > >of it being impossible to correctly resolve a root document
> > including a DTD
> > >which includes another resource via a relative reference (and
> > that's really
> > >common -- most DTDs include other DTDs).  A trivial example of such is
> > >parsing and validating something specifiying:
> > >
> > ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> > >         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> > >
> > >where xhtml1-strict.dtd has as it's first line:
> > >
> > ><!ENTITY % HTMLlat1 PUBLIC
> > >    "-//W3C//ENTITIES Latin 1 for XHTML//EN"
> > >    "xhtml-lat1.ent">
> > >%HTMLlat1
> > >
> > >I tried to do this before by implementing EntityResolver2 (against 2.3.0)
> > >but I came unstuck on the `name' parameter, where basically in a large
> > >number of places I wasn't sure what to put for that.  However, the new
> > >method would work just fine for me, provided that I could get at it!
> > >
> > >The patch below implements the change.  The code is already
> > present, so it
> > >just exposes it...
> > >
> > >Thanks,
> > >
> > >Mark
> > >
> > >diff -ur xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp
> > >xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp
> > >--- xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp
> > 2004-02-16
> > >20:52:16.000000000 +0000
> > >+++ xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp       2004-04-07
> > >01:41:56.583227000 +0100
> > >@@ -173,6 +173,7 @@
> > >
> > >  class ContentHandler ;
> > >  class DTDHandler;
> > >+class XMLEntityResolver;
> > >  class EntityResolver;
> > >  class ErrorHandler;
> > >  class InputSource;
> > >@@ -249,6 +250,13 @@
> > >      virtual EntityResolver* getEntityResolver() const = 0 ;
> > >
> > >      /**
> > >+      * This method returns the installed entity resolver.
> > >+      *
> > >+      * @return A pointer to the installed entity resolver object.
> > >+      */
> > >+    virtual XMLEntityResolver* getXMLEntityResolver() const = 0 ;
> > >+
> > >+       /**
> > >        * This method returns the installed error handler.
> > >        *
> > >        * @return A pointer to the installed error handler object.
> > >@@ -338,6 +346,24 @@
> > >      */
> > >      virtual void setEntityResolver(EntityResolver* const resolver) = 0;
> > >
> > >+  /** Set the entity resolver
> > >+    *
> > >+    * This method allows applications to install their own entity
> > >+    * resolver. By installing an entity resolver, the applications
> > >+    * can trap and potentially redirect references to external
> > >+    * entities.
> > >+    *
> > >+    * <i>Any previously set entity resolver is merely dropped, since
the
> > >parser
> > >+    * does not own them.  If both setEntityResolver and
> > >setXMLEntityResolver
> > >+    * are called, then the last one is used.</i>
> > >+    *
> > >+    * @param resolver  A const pointer to the user supplied entity
> > >+    *                  resolver.
> > >+    *
> > >+    * @see #getXMLEntityResolver
> > >+    */
> > >+    virtual void setXMLEntityResolver(XMLEntityResolver* const
> > resolver) =
> > >0;
> > >+
> > >    /**
> > >      * Allow an application to register an error event handler.
> > >      *
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> > >For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> >
> >
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Mime
View raw message