Return-Path: Delivered-To: apmail-xml-cocoon-cvs-archive@xml.apache.org Received: (qmail 66263 invoked by uid 500); 7 Sep 2001 11:01:52 -0000 Mailing-List: contact cocoon-cvs-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: cocoon-dev@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list cocoon-cvs@xml.apache.org Received: (qmail 66254 invoked by uid 500); 7 Sep 2001 11:01:52 -0000 Delivered-To: apmail-xml-cocoon2-cvs@apache.org Date: 7 Sep 2001 10:58:44 -0000 Message-ID: <20010907105844.66233.qmail@icarus.apache.org> From: dims@apache.org To: xml-cocoon2-cvs@apache.org Subject: cvs commit: xml-cocoon2/xdocs storejanitor.xml catalog.xml docs-book.xml jars.xml mrustore.xml site-book.xml X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N dims 01/09/07 03:58:44 Modified: src/org/apache/cocoon/components/store StoreJanitorImpl.java xdocs catalog.xml docs-book.xml jars.xml mrustore.xml site-book.xml Added: xdocs storejanitor.xml Log: - Patch for "xdoc update for the stores" from "Gerhard Froehlich" - Various patches from David Crossley for Catalog support Revision Changes Path 1.2 +0 -2 xml-cocoon2/src/org/apache/cocoon/components/store/StoreJanitorImpl.java Index: StoreJanitorImpl.java =================================================================== RCS file: /home/cvs/xml-cocoon2/src/org/apache/cocoon/components/store/StoreJanitorImpl.java,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- StoreJanitorImpl.java 2001/09/05 11:56:26 1.1 +++ StoreJanitorImpl.java 2001/09/07 10:58:44 1.2 @@ -13,9 +13,7 @@ import org.apache.avalon.framework.logger.AbstractLoggable; import org.apache.avalon.framework.parameters.Parameters; import org.apache.avalon.framework.thread.ThreadSafe; -import org.apache.cocoon.Constants; -import java.util.Collections; import java.util.ArrayList; /** 1.4 +107 -97 xml-cocoon2/xdocs/catalog.xml Index: catalog.xml =================================================================== RCS file: /home/cvs/xml-cocoon2/xdocs/catalog.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- catalog.xml 2001/09/03 11:41:07 1.3 +++ catalog.xml 2001/09/07 10:58:44 1.4 @@ -1,12 +1,13 @@ - +
Entity resolution with catalogs Resolve external entities to local or other resources - 1.3 + 1.4 Technical document @@ -16,20 +17,34 @@

- @docname@ has the capability to utilise an entity resolution mechanism. This - assists with entity management and also reduces the necessity for expensive - and failure-prone network retrieval of the required resources (e.g. DTDs, - character entity sets, XML sub-documents). + @docname@ has the capability to utilise an entity resolution mechanism. + External entities (e.g. Document Type Definitions (DTDs), character entity + sets, XML sub-documents) are resources that are declared by an XML instance + document - they exist as separate objects. An entity catalog assists with + entity management and the resolution of entities to accessible resources. + It also reduces the necessity for expensive and failure-prone network + retrieval of the required resources.

- "Entities" represent the physical structure of an XML instance document, whereas "elements" represent the logical structure. The complete entity structure of the document defines which pieces need to be incorporated, so as to build the final document. Those entities are objects from some accessible place, e.g. local file system, local network, remote network, generated from a database. Example entities are: DTDs, XML sub-documents, sets of character entities to represent symbols and other glyphs, image files. + "Entities" represent the physical structure of an XML instance document, + whereas "elements" represent the logical structure. The complete entity + structure of the document defines which pieces need to be incorporated, so + as to build the final document. Those entities are objects from some + accessible place, e.g. local file system, local network, remote network, + generated from a database. Example entities are: DTDs, XML sub-documents, + sets of character entities to represent symbols and other glyphs, image + files.

- So how are you going to define the accessible location of all those pieces? How will you ensure that those resources are reliably available? Entity resolution catalogs to the rescue. These are simple standards-based plain-text files to map public identifiers and system identifiers to local or other resources. + So how are you going to define the accessible location of all those pieces? + How will you ensure that those resources are reliably available? Entity + resolution catalogs to the rescue. These are simple standards-based + plain-text files to map public identifiers and system identifiers to local + or other resources.

@@ -61,25 +76,25 @@

  • Demonstration #2 - explains more detailed need and use of catalogs + and shows catalogs in action
  • Implementation and default configuration - describes how support for catalogs is added to @docname@ and - explain the default configuration (which should work out-of-the-box) + explains the default automated configuration
  • Local configuration - explains how to extend the default configuration for your local - system reqirements and provides an example + system requirements and provides an example
  • Development notes - - default catalog support is now in the 2.1-dev branch - - needs to confirm operation on all major platforms + - some minor issues need to be addressed
  • Other notes - - assorted notes + - assorted dot-points
  • Summary @@ -94,22 +109,29 @@

    - The following article eloquently describes the need for all -parsers and XML frameworks to be capable of utilising entity -resolvers. + The following article eloquently describes the need for all parsers and + XML frameworks to be capable of utilising entity resolvers. "If You Can Name It, You Can Claim It!" - by Norman Walsh. Please read that document, then return here to apply entity catalogs to @docname@. + by Norman Walsh. Please read that document, then return here to apply + entity catalogs to @docname@.

    - (Note: That article (and Java classes) evolved to become the Sun resolver.zip Java package that has been added to @docname@ - a more recent version of the article is available with the Sun download (see below). The API javadocs from your build have further information. However, you do not need to know the gory details to understand catalogs and configure them.) + (Note: That article (and Java classes) evolved to become the Sun + resolver.zip Java package that has been added to @docname@ + - a more recent version of the article is available with the Sun download + (see below). The API javadocs from your build have further information. + However, you do not need to know the gory details to understand catalogs + and configure them.)

    - This snippet from an XML instance shows the Document Type Declaration. Notice that it declares its ruleset, the Document Type Definition (DTD), as an external entity. Notice also that the resource is network-based. + This snippet from an XML instance shows the Document Type Declaration. + Notice that it declares its ruleset, the Document Type Definition (DTD), + as an external entity. Notice also that the resource is network-based.

    - Now consider what will happen when @docname@ tries to process this XML instance. Whether you have set validation=yes or not, the parser will still want to resolve all of the entities that are required by the XML instance (i.e. the DTD and any other entities that the DTD might declare). So it will happily trundle across the network to get them. It will do this every time that the document is processed. This is obviously a needless overhead. Worse still, what happens if that host is down or the network is congested. Additionally, if your @docname@ is an off-line server then it is always broken because it cannot retrieve the network-based resources. + Now consider what will happen when @docname@ tries to process this XML + instance. Whether you have set validation=yes or not, the parser will + still want to resolve all of the entities that are required by the XML + instance (i.e. the DTD and any other entities that the DTD might declare). + So it will happily trundle across the network to get them. It will do this + every time that the document is processed. This is obviously a needless + overhead. Worse still, what happens if that host is down or the network is + congested. Additionally, if your @docname@ is an off-line server then it is + always broken because it cannot retrieve the network-based resources.

    - As the Walsh document explained, the secrets to entity resolution are the public identifiers, system identifiers, and the catalog to map between them. Here we provide an overview and show an example catalog which we will then use with the - Demonstration #2 below. + As the Walsh document explained, the secrets to entity resolution are the + public identifiers, system identifiers, and the catalog to map between them. + Here we provide an overview and show an example catalog which we will then + use with the Demonstration #2 below.

    @@ -156,8 +188,6 @@ "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"> ]]> -TODO: briefly explain each of those declarations -

    (In your XML instance document, or DTD, you would include those entities like this ... %ISOnum;) @@ -243,11 +273,10 @@ role, and each included external entity reports how it came into being. This example builds upon the example provided by the Walsh article. (Tip: To see the error message that would result from not using a catalog, - simply rename the default properties file or default catalog file before - starting @docname@.) + simply rename the default catalog file before starting @docname@.)

    -TODO: ensure that the link to samples works OK +TODO: ensure that the link to samples works OK in the various documentation situations (i.e. static site, local docs build)

    Here is the source for the top-level XML instance document test.xml ... @@ -349,26 +378,31 @@ The SAX Parser interface provides an entityResolver hook to allow an application to resolve the external entities. The Sun Microsystems Java code for "resolver.jar" provides a - CatalogManager. This is incorporated into @doctitle@ as - org.apache.cocoon.components.resolver and configuration is - achieved via the CatalogManager.properties file. + CatalogManager. This is incorporated into @docname@ as + org.apache.cocoon.components.resolver and local configuration + is achieved via the CatalogManager.properties file.

    • A default catalog and some base entities (e.g. ISO*.pen character - entity sets) are included in the @doctitle@ distribution at + entity sets) are included in the @docname@ distribution at webapps/cocoon/resources/entities/
    • -
    • A default annotated CatalogManager.properties file is - included with the distribution (see the Build Notes below). +
    • The default catalog is automatically loaded at startup.
    • -
    • The automatic default configuration should work out-of-the-box
    • +
    • An annotated CatalogManager.properties file is included + with the distribution to facilitate the configuration of local catalogs. +
    • +
    • The automatic default configuration should work out-of-the-box.
    - TODO: We need to explain the properties file here in doco (the internal - annotation helps for now) ... full documentation is available with the - Sun download. - +

    + When the parser needs to load a declared entity, then it first consults + the Catalog Manager to get a possible mapping to an alternate system + identifier. If there is no mapping for an identifier in the catalogs + (or in any sub-ordinate catalogs), then @docname@ will carry on to + retrieve the resource using the original declared system identifier. +

    If you suspect problems, then you can raise the level of the @@ -376,43 +410,27 @@ to stdout when @docname@ starts and operates. You would also do this to detect any misconfiguration of your own catalogs.

    - - -

    - Use the following options to your build command ... -
    -Dinclude.webapp.libs=yes -
    -Dinstall.war=$TOMCAT_HOME/webapps install -

    - -

    - This allows the build process to copy the properties file from -$COCOON_HOME/webapp/resources/entities/CatalogManager.properties - to -$TOMCAT_HOME/webapps/cocoon/WEB-INF/classes/CatalogManager.properties - thereby making it available to the Java classpath. The build process will - also automatically adjust the full pathname for the default catalog to suit - your local directory structure. -

    - -

    - If you see an error message going to STDOUT when @docname@ starts - (Cannot find CatalogManager.properties) then this means that - the properties file is not available to the Java classpath. Please ensure - that you build as described above, or edit and move the properties file - into place manually. -

    -

    - You can add your own catalog by appending another full pathname to - the catalogs property in the default properties file - (see notes inside the properties file). + You can add your own local catalogs using the catalogs property + in the default properties file. See the notes inside the properties file).

    + The build process will automatically copy the properties file from +$COCOON_HOME/webapp/resources/entities/CatalogManager.properties + to +$TOMCAT_HOME/webapps/cocoon/WEB-INF/classes/CatalogManager.properties + thereby making it available to the Java classpath. + If you see an error message going to STDOUT when @docname@ starts + (Cannot find CatalogManager.properties) then this means that + the properties file is not available to the Java classpath. +

    + +

    The actual "catalog" files have a powerful set of directives. For example, the CATALOG directive facilitates the inclusion of a sub-ordinate catalog. The list of resources below will @@ -481,30 +499,12 @@

  • 5) ? What other default entities need to be shipped with the @docname@ distribution? We already have some character entity sets (ISO*.pen).
  • -
  • 6) Future: It would be nice to have the - org.apache.cocoon.components.resolver classes - automatically load the default catalog, thereby leaving the - properties config file totally free for local use. +
  • 7)
  • -

    - Platform testing so far ... -

    - -
      -
    1. Linux Red Hat 7.1, java.vm.version=Blackdown-1.3.1-FCS, - Tomcat 3.2.2 ... OK
    2. -
    3. Win2K, Tomcat 3.3 ... OK
    4. -
    5. Windows 2000 Professional, Tomcat 3.2.3 and Tomcat 3.2.1 ... OK
    6. -
    7. Macintosh ... looking for success story
    8. -
    9. Other Windows ... looking for success story
    10. -
    11. Other UNIX ... looking for success story
    12. -
    13. Other JDK versions ... looking for success story
    14. -
    -

    - Some core @docname@ FIXME notes can be addressed by catalog ... + Some core @docname@ FIXME notes can be now be addressed by catalog ...

      @@ -514,7 +514,7 @@
    • there are various other hard-coded pathnames to XML resources
    • this needs further investigation after basic catalog support is - implemented + fully settled
    @@ -534,12 +534,9 @@
  • There has been a recent flood of XML tools - unfortunately, many do not implement entity resolution (other than by brute-force retrieval), so those tools are crippled and cannot be used for serious XML processing. - Please ensure that you choose proper XML tools for the preparation and - vaildation of your XML instance documents. -
  • -
  • If there is no mapping for an identifier in the catalog (or in any - sub-ordinate catalogs), then @docname@ will carry on to retrieve the - resource using the declared system identifier. + Please ensure that you choose + proper XML tools + for the preparation and validation of your XML instance documents.
  • The default catalog that is shipped with the @docname@ distribution is deliberately basic. You will need to supplement it with your own catalog @@ -551,11 +548,23 @@

    - Most XML documents that we would want to serve with @docname@ are already in existence in another information system. The XML document instances have a declaration of their DTD Document Type Definition as an external file. This external DTD also includes entity sets such as ISOnum, ISOlat1, etc. Also the DTD declaration has a Formal Public Identifier and a System Identifier which points to a remote URL. These XML instance documents cannot be altered to make workaround solutions like ../dtd/document-1.0.dtd + Most XML documents that we would want to serve with @docname@ are already + in existence in another information system. The XML document instances have + a declaration of their DTD Document Type Definition as an external file. + This external DTD also includes entity sets such as ISOnum, ISOlat1, etc. + Also the DTD declaration has a Formal Public Identifier and a System + Identifier which points to a remote URL. These XML instance documents cannot + be altered to make workaround solutions like + ../dtd/document-1.0.dtd

    - Entity management is effected by providing a standards-based mechanism to resolve public identifiers and system identifiers to local filenames or other identifiers or even to other remote network resources. So references to external DTDs, sets of character entities such as mathematical symbols, fragments of XML documents, complete sub-documents, non-xml data chunks (like images), etc. can all be centrally managed and resolved locally. + Entity management is effected by providing a standards-based mechanism to + resolve public identifiers and system identifiers to local filenames or + other identifiers or even to other remote network resources. So references + to external DTDs, sets of character entities such as mathematical symbols, + fragments of XML documents, complete sub-documents, non-xml data chunks + (like images), etc. can all be centrally managed and resolved locally.

    @@ -568,7 +577,8 @@
    • OASIS Entity Resolution Technical Committee - see especially the - specification for OASIS Catalogs (TR 9401:1995 Entity Management) + specification for + OASIS Catalogs (TR 9401:1995 Entity Management) and the specification for XML Catalogs
    • 1.28 +2 -1 xml-cocoon2/xdocs/docs-book.xml Index: docs-book.xml =================================================================== RCS file: /home/cvs/xml-cocoon2/xdocs/docs-book.xml,v retrieving revision 1.27 retrieving revision 1.28 diff -u -r1.27 -r1.28 --- docs-book.xml 2001/09/06 20:56:21 1.27 +++ docs-book.xml 2001/09/07 10:58:44 1.28 @@ -67,7 +67,8 @@ - + + 1.12 +8 -0 xml-cocoon2/xdocs/jars.xml Index: jars.xml =================================================================== RCS file: /home/cvs/xml-cocoon2/xdocs/jars.xml,v retrieving revision 1.11 retrieving revision 1.12 diff -u -r1.11 -r1.12 --- jars.xml 2001/08/14 14:44:33 1.11 +++ jars.xml 2001/09/07 10:58:44 1.12 @@ -191,6 +191,14 @@ File upload capability - very useful in servlet environment. + resolver + Entity resolution catalogs - XML Entity and URI Resolvers + Yes + Resolver + Entity Catalogs + + + rhino Rhino is an implementation of JavaScript in Java. No 1.4 +5 -21 xml-cocoon2/xdocs/mrustore.xml Index: mrustore.xml =================================================================== RCS file: /home/cvs/xml-cocoon2/xdocs/mrustore.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- mrustore.xml 2001/07/22 20:07:49 1.3 +++ mrustore.xml 2001/09/07 10:58:44 1.4 @@ -32,18 +32,18 @@ contain the key to the data in the HashMap. When calling the store() method a new entry to the front of the list is inserted. If the list is already full, the free() method is called and the oldest, - the last one in the LinkedList, data entry is also removed from the HashMap and the + the last one in the LinkedList, data entry is removed from the HashMap and from the LinkedList. When calling the get() method, the store returns the object by key and inserts the requested key on the top of the LinkedList. - This implementation keeps only the most recent objects in the cache and provides the best + This implementation keeps the most recent used objects in the store and provides the best use of the machines memory.

      - Caching in Memory is fast, but when the JVM restarts your processed Objects are gone and + Storing in Memory is fast, but when the JVM restarts your processed Objects are gone and must be processed again, although they didn't have changed. What a waste of CPU time.

      @@ -64,24 +64,16 @@

      - The CleanUp Thread checks that memory is not running too low in the JVM because of the Store. - It will try to keep overall memory usage below the requested levels. + The WriterThread is notified when an object is pushed on the stack and serialize the objects + in the stack on the filesystem.

      -

      - The Writer Thread is notified when an object is pushed on the stack to be written on the filesystem. - Then Writer Thread kicks in and serialize the objects on the filesystem. -

      - - - - @@ -91,17 +83,9 @@
      1. <event-cache class="org.apache.cocoon.components.store.MRUMemoryStore">: Assigns the MRUMemoryStore as the actual EventCache.
      2. -
      3. <parameter name="freememory" value="1000000"/>: - Indicates how much memory should be left free in the JVM for normal operation..
      4. -
      5. <parameter name="heapsize" value="60000000"/>: - Indicates how big the heap size can grow to before the cleanup thread kicks in.
      6. -
      7. <parameter name="cleanupthreadinterval" value="10"/>: - Indicates the interval of the cleanup thread in seconds.
      8. <parameter name="maxobjects" value="100"/>: Indicates how many objects will be hold in the cache. When the number of maxobjects has been reached. The last object in the cache will be thrown out.
      9. -
      10. <parameter name="usecleanupthread" value="true"/>: - Indicates whether we use a cleanup thread or not.
      11. <parameter name="threadpriority" value="5"/>: Indicates the priority of the background threads. 1 is the lowest priority and 10 is the highest.
      12. <parameter name="filesystem" value="true"/>: 1.30 +2 -1 xml-cocoon2/xdocs/site-book.xml Index: site-book.xml =================================================================== RCS file: /home/cvs/xml-cocoon2/xdocs/site-book.xml,v retrieving revision 1.29 retrieving revision 1.30 diff -u -r1.29 -r1.30 --- site-book.xml 2001/09/06 20:56:21 1.29 +++ site-book.xml 2001/09/07 10:58:44 1.30 @@ -69,7 +69,8 @@ - + + 1.2 +72 -0 xml-cocoon2/xdocs/storejanitor.xml ---------------------------------------------------------------------- In case of troubles, e-mail: webmaster@xml.apache.org To unsubscribe, e-mail: cocoon-cvs-unsubscribe@xml.apache.org For additional commands, e-mail: cocoon-cvs-help@xml.apache.org