xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Glavassevich <mrgla...@ca.ibm.com>
Subject Re: how do I detect internal subset when part of external subset?
Date Fri, 07 Apr 2006 03:32:06 GMT
Hi Jacob,

<!ENTITY head SYSTEM "header.xml">
<!ENTITY foot SYSTEM "footer.xml">
<!ENTITY torso SYSTEM "body.xml">

are external entity declarations [1][2]. They are reported by 
XMLDTDHandler.externalEntityDecl() in XNI and DeclHandler.
externalEntityDecl() in SAX.

Thanks.

[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl
[2] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Jacob Kjome <hoju@visi.com> wrote on 04/06/2006 11:07:57 PM:

> 
> Thanks for the tip, Elliotte.  I'll remember it 
> when I use SAX.  I'm using XNI in this case.  I 
> suppose I could use SAX, but I'm really just 
> trying to migrate from Xerces1 to Xerces2 for 
> XMLC.  XMLC already depends directly on Xerces 
> because of the custom DOM's XMLC implements.  I 
> also wanted to change as little as possible.  I 
> may make more radical changes once I've proven 
> that I can make things work properly with minimal changes.
> 
> In any case, I think I've got the internal subset 
> stuff working, except for one thing.  Take the following document...
> 
> <?xml version="1.0" standalone="no"?>
> <!DOCTYPE document SYSTEM "document.dtd" [
>    <!ENTITY head SYSTEM "header.xml">
>    <!ENTITY foot SYSTEM "footer.xml">
>    <!ENTITY torso SYSTEM "body.xml">
>    <!ENTITY erh "Elliotte Rusty Harold">
> ]>
> <document>
>    &head; &torso; &foot;
> </document>
> 
> The only part of this that ends up in the 
> internal subset is the "erh" entity.  That is, 
> the internalEntityDecl() method gets called only 
> for the "erh" entity and is not notified at all 
> for the other entities.  Then, as I build up the 
> DOM, I create EntityReference's for "&head; 
> &torso; &foot;" in the <document>.  Upon 
> serialization, they end up being there in the 
> document, but since I was never notified to 
> create the corresponding <!ENTITY> elements in 
> the internal subset, re-parsing of the serialized 
> document fails.  So, how do I get notified about 
> these so I can get them into the DOM unparsed?  I 
> want the serialized DOM to look as identical as 
> possible to the above.  I must be missing something.
> 
> 
> Jake
> 
> 
> At 06:41 AM 4/4/2006, you wrote:
>  >The trick is to look for the entity name "[dtd]". XOM accomplishes 
this
>  >thusly using pure SAX:
>  >
>  >
>  >     protected boolean inExternalSubset = false;
>  >
>  >     // We have a problem here. Xerces gets this right,
>  >     // but Crimson and possibly other parsers don't properly
>  >     // report these entities, or perhaps just not tag them
>  >     // with [dtd] like they're supposed to.
>  >     public void startEntity(String name) {
>  >       if (name.equals("[dtd]")) inExternalSubset = true;
>  >     }
>  >
>  >
>  >     public void endEntity(String name) {
>  >       if (name.equals("[dtd]")) inExternalSubset = false;
>  >     }
>  >
>  >You can just reverse the logic if you prefer inInternalSubset.
>  >
>  >--
>  >Elliotte Rusty Harold  elharo@metalab.unc.edu
>  >XML in a Nutshell 3rd Edition Just Published!
>  >http://www.cafeconleche.org/books/xian3/
> >http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>  >
>  >---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >
>  > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message