commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rdon...@apache.org
Subject cvs commit: jakarta-commons/digester/src/java/org/apache/commons/digester Digester.java package.html
Date Sat, 24 Jan 2004 11:22:31 GMT
rdonkin     2004/01/24 03:22:31

  Modified:    digester/src/java/org/apache/commons/digester Digester.java
                        package.html
  Log:
  Added some documentation on register() and on external entities. The content is probably
a little bit controversial but people using just system identifiers is a bit peeve of mine.
Free free to ammend or add different viewpoints :)
  
  Revision  Changes    Path
  1.91      +20 -6     jakarta-commons/digester/src/java/org/apache/commons/digester/Digester.java
  
  Index: Digester.java
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/digester/src/java/org/apache/commons/digester/Digester.java,v
  retrieving revision 1.90
  retrieving revision 1.91
  diff -u -r1.90 -r1.91
  --- Digester.java	10 Jan 2004 17:34:17 -0000	1.90
  +++ Digester.java	24 Jan 2004 11:22:31 -0000	1.91
  @@ -1660,9 +1660,23 @@
   
   
       /**
  -     * Register the specified DTD URL for the specified public identifier.
  +     * <p>Register the specified DTD URL for the specified public identifier.
        * This must be called before the first call to <code>parse()</code>.
  -     *
  +     * </p><p>
  +     * <code>Digester</code> contains an internal <code>EntityResolver</code>
  +     * implementation. This maps <code>PUBLICID</code>'s to URLs 
  +     * (from which the resource will be loaded). A common use case for this
  +     * method is to register local URLs (possibly computed at runtime by a 
  +     * classloader) for DTDs. This allows the performance advantage of using
  +     * a local version without having to ensure every <code>SYSTEM</code>
  +     * URI on every processed xml document is local. This implementation provides
  +     * only basic functionality. If more sophisticated features are required,
  +     * using {@link #setEntityResolver} to set a custom resolver is recommended.
  +     * </p><p>
  +     * <strong>Note:</strong> This method will have no effect when a custom

  +     * <code>EntityResolver</code> has been set. (Setting a custom 
  +     * <code>EntityResolver</code> overrides the internal implementation.)

  +     * </p>
        * @param publicId Public identifier of the DTD to be resolved
        * @param entityURL The URL to use for reading this DTD
        */
  
  
  
  1.26      +75 -0     jakarta-commons/digester/src/java/org/apache/commons/digester/package.html
  
  Index: package.html
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/digester/src/java/org/apache/commons/digester/package.html,v
  retrieving revision 1.25
  retrieving revision 1.26
  diff -u -r1.25 -r1.26
  --- package.html	13 Jan 2004 20:23:25 -0000	1.25
  +++ package.html	24 Jan 2004 11:22:31 -0000	1.26
  @@ -19,6 +19,7 @@
   <a href="#doc.Namespace">[Namespace Aware Parsing]</a>
   <a href="#doc.Pluggable">[Pluggable Rules Processing]</a>
   <a href="#doc.RuleSets">[Encapsulated Rule Sets]</a>
  +<a href="#doc.RegisteringDTDs">[Registering DTDs]</a>
   <a href="#doc.troubleshooting">[Troubleshooting]</a>
   <a href="#doc.FAQ">[FAQ]</a>
   <a href="#doc.Limits">[Known Limitations]</a>
  @@ -993,6 +994,80 @@
       the same set of nested elements at different nesting levels within an
       XML document.</li>
   </ul>
  +<a name="doc.RegisteringDTDs"></a>
  +<h3>Registering DTDs</h3>
  +
  +<h4>Brief (But Still Too Long) Introduction To System and Public Identifiers</h4>
  +<p>A definition for an external entity comes in one of two forms:
  +</p>
  +<ol>
  +    <li><code>SYSTEM <em>system-identifier</em></code></li>
  +    <li><code>PUBLIC <em>public-identifier</em> <em>system-identifier</em></code></li>
  +</ol>
  +<p>
  +The <code><em>system-identifier</em></code> is an URI from which
the resource can be obtained
  +(either directly or indirectly). Many valid URIs may identify the same resource.
  +The <code><em>public-identifier</em></code> is an additional free
identifier which may be used
  +(by the parser) to locate the resource. 
  +</p>
  +<p>
  +In practice, the weakness with a <code><em>system-identifier</em></code>
is that most parsers
  +will attempt to interprete this URI as an URL, try to download the resource directly
  +from the URL and stop the parsing if this download fails. So, this means that 
  +almost always the URI will have to be an URL from which the declaration
  +can be downloaded.
  +</p>
  +<p>
  +URLs may be local or remote but if the URL is chosen to be local, it is likely only
  +to function correctly on a small number of machines (which are configured precisely
  +to allow the xml to be parsed). This is usually unsatisfactory and so a universally
  +accessable URL is preferred. This usually means an internet URL.
  +</p>
  +<p>
  +To recap, in practice the <code><em>system-identifier</em></code>
will (most likely) be an 
  +internet URL. Unfortunately downloading from an internet URL is not only slow
  +but unreliable (since successfully downloading a document from the internet 
  +relies on the client being connect to the internet and the server being
  +able to satisfy the request).
  +</p>
  +<p>
  +The <code><em>public-identifier</em></code> is a freely defined
name but (in practice) it is 
  +strongly recommended that a unique, readable and open format is used (for reasons
  +that should become clear later). A Formal Public Identifier (FPI) is a very
  +common choice. This public identifier is often used to provide a unique and location
  +independent key which can be used to subsistute local resources for remote ones 
  +(hint: this is why ;).
  +</p>
  +<p>
  +By using the second (<code>PUBLIC</code>) form combined with some form of local
  +catalog (which matches <code><em>public-identifier</em></code>'s
to local resources) and where
  +the <code><em>public-identifier</em></code> is a unique name and
the <code><em>system-identifier</em></code> 
  +is an internet URL, the practical disadvantages of specifying just a 
  +<code><em>system-identifier</em></code> can be avoided. Those external
entities which have been 
  +store locally (on the machine parsing the document) can be identified and used.
  +Only when no local copy exists is it necessary to download the document
  +from the internet URL. This naming scheme is recommended when using <code>Digester</code>.
  +</p>
  +
  +<h4>External Entity Resolution Using Digester</h4>
  +<p>
  +SAX factors out the resolution of external entities into an <code>EntityResolver</code>.
  +<code>Digester</code> supports the use of custom <code>EntityResolver</code>

  +but ships with a simple internal implementation. This implementation allows local URLs
  +to be easily associated with <code><em>public-identifier</em></code>'s.

  +</p>
  +<p>For example:</p>
  +<code><pre>
  +    digester.register("-//Example Dot Com //DTD Sample Example//EN", "assets/sample.dtd");
  +</pre></code>
  +<p>
  +will make digester return the relative file path <code>assets/sample.dtd</code>

  +whenever an external entity with public id 
  +<code>-//Example Dot Com //DTD Sample Example//EN</code> is needed.
  +</p>
  +<p><strong>Note:</strong> This is a simple (but useful) implementation.

  +Greater sophistication requires a custom <code>EntityResolver</code>.</p>
  +    
   <a name="doc.troubleshooting"></a>
   <h3>Troubleshooting</h3>
   <h4>Debugging Exceptions</h4>
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message