cocoon-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject cvs commit: cocoon-2.1/src/documentation/xdocs/userdocs/offline ant.xml bean.xml book.xml cli.xml index.xml
Date Thu, 07 Aug 2003 10:39:42 GMT
upayavira    2003/08/07 03:39:42

  Modified:    src/documentation/xdocs/userdocs book.xml
  Added:       src/documentation/xdocs/userdocs/offline ant.xml bean.xml
                        book.xml cli.xml index.xml
  Documentation for the CLI and Bean
  Revision  Changes    Path
  1.5       +3 -0      cocoon-2.1/src/documentation/xdocs/userdocs/book.xml
  Index: book.xml
  RCS file: /home/cvs/cocoon-2.1/src/documentation/xdocs/userdocs/book.xml,v
  retrieving revision 1.4
  retrieving revision 1.5
  diff -u -r1.4 -r1.5
  --- book.xml	1 Aug 2003 09:41:23 -0000	1.4
  +++ book.xml	7 Aug 2003 10:39:42 -0000	1.5
  @@ -31,6 +31,9 @@
       <menu-item label="XSP" href="xsp/index.html"/>
  +  <menu label="Offline Generation">
  +    <menu-item label="Offline Generation" href="offline/index.html"/>
  +  </menu>
  1.1                  cocoon-2.1/src/documentation/xdocs/userdocs/offline/ant.xml
  Index: ant.xml
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">
  	 <title>Offline Page Generation with Apache Ant</title>
  	 <type>Technical document</type>
  	 <authors><person name="Upayavira" email=""/>
  	 <abstract>This document explains how to use Cocoon to generate offline pages and
sites with Apache Ant.</abstract>
  	 <s1 title="Overview">
  		<p>Apache Ant can be used to start Cocoon in its Offline mode. Whilst a specific

             Cocoon Ant task is planned, at present it can be invoked by starting the 
             command line interface using a standard Java task.
  	 <s1 title="Sample Ant Task">
         <p>A sample Ant task would be as follows:</p>
  <java classname="org.apache.cocoon.Main" fork="true"
            failonerror="true" maxmemory="128m">
        <arg value="-xcli.xconf"/>
        <arg value="index.html"/>
          <path refid="classpath"/>
          <fileset dir="${build.dir}">
            <include name="*.jar"/>
          <pathelement location="${tools.jar}"/>
          <pathelement location="${build.context}/WEB-INF/classes"/>
  ]]>    </source>
         <p>This makes use of the Cocoon Command Line Interface's xconf configuration
file. See
            <link href="cli.html">command line</link> page for details about how
to use this file.</p>
  1.1                  cocoon-2.1/src/documentation/xdocs/userdocs/offline/bean.xml
  Index: bean.xml
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">
  	 <title>The Cocoon Bean</title>
  	 <type>Technical document</type>
  	 <authors><person name="Upayavira" email=""/>
  	 <abstract>This document details the basics of using the Cocoon bean.</abstract>
  	 <s1 title="Overview">
  		<p>The Cocoon Bean provides a Java programmatic interface for offline page and site
  		   with Apache Cocoon.
  	 <s1 title="Details">
         <p>The Cocoon Bean forms the core of, and is used by the Cocoon Command Line
         <p>To find more about using the bean, look at the code for the CLI, which can
be found 
            in the Cocoon codebase in <code>src/java</code>, in the class 
         <note>Whilst the Cocoon Bean works, it is still under development, and therefore
its API
               must be considered unstable. Return to this page in future versions to see
what has 
  1.1                  cocoon-2.1/src/documentation/xdocs/userdocs/offline/book.xml
  Index: book.xml
  <?xml version="1.0"?>
  <!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../../dtd/book-cocoon-v10.dtd">
  <book software="Apache Cocoon" 
        title="Apache Cocoon User Documentation - Concepts" 
        copyright="@year@ The Apache Software Foundation">
    <menu label="Navigation">
      <menu-item label="Main" href="../../index.html"/>
      <menu-item label="User Documentation" href="../index.html"/>
    <menu label="Offline">
      <menu-item label="Overview" href="index.html"/>
      <menu-item label="Command Line" href="cli.html"/>
      <menu-item label="Ant" href="ant.html"/>
      <menu-item label="Cocoon Bean" href="bean.html"/>
  1.1                  cocoon-2.1/src/documentation/xdocs/userdocs/offline/cli.xml
  Index: cli.xml
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">
      <title>Offline Page Generation with the Command Line Interface</title>
      <type>Technical document</type>
      <authors><person name="Upayavira" email=""/>
      <abstract>This document explains how to use the Cocoon Command Line Interface
for offline page and site generation.</abstract>
      <s1 title="Overview">
        <p>The Command Line Interface provides access to Cocoon's offline generation
        <p>This page gives details of how configure and use the CLI. Details of the
concepts behind
           offline page generation are given on the offline generation 
           <link href="index.html">overview</link> page.</p>
      <s1 title="Invoking the CLI">
        <p>The CLI can be invoked from the command line. Change to the root directory
of your 
           Cocoon distribution, and then, on Unix use: <code>./ cli &lt;parameters&gt;</code>

           and on Windows use <code>cocoon.bat cli &lt;parameters&gt;</code></p>
        <p>The relevant parameters are detailed in the following sections.</p>
      <s1 title="Configuring the CLI">
        <p>The CLI has two methods of configuration, with an <code>xconf</code>
file, and using 
           command line parameters.</p>
        <p>The <code>xconf</code> method is the newer, and gives access
to a wider range of 
           features, and is thus explained first.</p>
        <note>Whilst the xconf method provides access to more features, the command
              parameter method is more stable, as there are currently plans to improve
              the xconf format to allow greater flexibility. If you require a stable and
              consistent method for accessing the CLI, it is recommended that you use the

              command line parameter method.</note>
        <s2 title="Using an Xconf file">
          <p>To start the CLI using an xconf file, on Unix do <code>./
cli -x &lt;xconf file&gt;</code>
             or on Windows: <code>cocoon cli -x &lt;xconf file&gt;</code>.</p>
          <p>A sample xconf file is included below.</p>
  <?xml version="1.0"?>
      |  This is the Apache Cocoon command line configuration file. 
      |  Here you give the command line interface details of where
      |  to find various aspects of your Cocoon installation.
      |  If you wish, you can also use this file to specify the URIs
      |  that you wish to generate.
      |  The current configuration information in this file is for
      |  building the Cocoon documentation. Therefore, all links here 
      |  are relative to the build context dir, which, in the build.xml 
      |  file, is set to ${build.context} 
      |  Options:
      |    verbose:            increase amount of information presented
      |                        to standard output (default: false)
      |    follow-links:       whether linked pages should also be 
      |                        generated (default: true)
      |    precompile-only:    precompile sitemaps and XSP pages, but 
      |                        do not generate any pages (default: false)
      |    confirm-extensions: check the mime type for the generated page
      |                        and adjust filename and links extensions
      |                        to match the mime type 
      |                        (e.g. text/html->.html)
  <cocoon verbose="true"  
         | Broken link reporting options:
         |   Report into a text file, one link per line:
         |     <broken-links type="text" report="filename"/>
         |   Report into an XML file:
         |     <broken-links type="xml" report="filename"/>
         |   Ignore broken links (default):
         |     <broken-links type="none"/>
         |   When a page includes an error, should a page be generated?
         |   Two attributes to this node specify whether a page should
         |   be generated when an error occured. 'generate' specifies 
         |   whether a page should be generated (default: true) and
         |   extension specifies an extension that should be appended
         |   to the generated page's filename (default: none)
         |     <broken-links generate="true" extension=".error.txt"/>
     <broken-links type="xml" 
         |  Load classes at startup. This is necessary for generating
         |  from sites that use SQL databases and JDBC.
         |  The <load-class> element can be repeated if multiple classes
         |  are needed.
     <logging log-kit="WEB-INF/logkit.xconf" logger="cli" level="ERROR" />
         |  The context directory is usually the webapp directory
         |  containing the sitemap.xmap file.
         |  The config file is the cocoon.xconf file.
         |  The work directory is used by Cocoon to store temporary
         |  files and cache files.
         |  The destination directory is where generated pages will
         |  be written (assuming the 'simple' mapper is used)
         | Specifies the filename to be appended to URIs that
         | refer to a directory (i.e. end with a forward slash).
         |  Specifies a user agent string to the sitemap when
         |  generating the site.
         |  Specifies an accept string to the sitemap when generating
         |  the site.
         |  Specifies the URIs that should be generated (using <uri>
         |  elements, and (if necessary) what should be done with the
         |  generated pages.
         |  The old behaviour - appends uri to the specified destination
         |  directory (as specified in <dest-dir>):
         |   <uri>documents/index.html</uri>
         |  Append: append the generated page's URI to the end of the 
         |  source URI:
         |   <uri type="append" src-prefix="documents/" src="index.html"
         |   dest="build/dest/"/>
         |  Replace: Completely ignore the generated page's URI - just 
         |  use the destination URI:
         |   <uri type="replace" src-prefix="documents/" src="index.html" 
         |   dest="build/dest/docs.html"/>
         |  Insert: Insert generated page's URI into the destination 
         |  URI at the point marked with a * (example uses fictional 
         |  zip protocol)
         |   <uri type="insert" src-prefix="documents/" src="index.html" 
         |   dest="zip://*.zip/page.html"/>
     <uri type="append" src-prefix="documents/" src="index.html" dest="docs/"/>
         |  File containing URIs (plain text, one per
         |  line).
          <s3 title="Broken Link Handling">
          <p>The xconf method allows for more sophisticated broken link handling. The
               user can select to have broken links reported to a file, this file being
               either text or XML.</p>
              <p>When this file is plain text, it will have one link URI per line.</p>
            <p>When this file is in XML, it will detail a message explaining the reason
               for the broken link, as well as the URI of the link.</p>
            <p>It is also possible to specify whether an error page should be generated

               in the place of the broken page (based upon the configured 
               <code>&lt;map:handle-errors&gt;</code> code in the sitemap).
If required,
               an extension can be appended to the original file's URI to signify that
               it is an error page (e.g. <code>.error</code>).</p>
        <s2 title="Command Line Parameters">
          <p>You can get a listing of the available parameters on unix with 
          <code>./ cli -h</code> or on Windows with <code>cocoon
cli -h</code>.
          This should give a listing something like:</p>
  -------------------- Executing -----------------
  Main Class: org.apache.cocoon.Main
  usage: cocoon cli [options] [targets]
  cocoon 2.1
  Copyright (c) 1999-2003 Apache Software Foundation. All rights reserved.
   -a,--userAgent            use given string for user-agent header
   -e,--confirmExtensions    confirm that file extensions match mime-type of
                             pages and amend filename accordingly (default is true)
   -C,--configFile           specify alternate location of the configuration
                             file (default is ${contextDir}/cocoon.xconf)
   -D,--defaultFilename      specify a filename to be appended to a URI when
                             the URI refers to a directory
   -L,--loadClass            specify a class to be loaded at startup
                             (specifically for use with JDBC). Can be used multiple times
   -P,--precompileOnly       generate java code for xsp and xmap files
   -V,--verbose              enable verbose messages to System.out
   -b,--brokenLinkFile       send a list of broken links to a file (one URI
                             per line)
   -c,--contextDir           use given dir as context
   -d,--destDir              use given dir as destination
   -f,--uriFile              use a text file with uris to process (one URI
                             per line)
   -h,--help                 print this message and exit
   -k,--logKitconfig         use given file for LogKit Management
   -l,--Logger               use given logger category as default logger for
                             the Cocoon engine
   -p,--accept               use given string for accept header
   -r,--followLinks          process pages linked from starting page or not
                             (boolean argument is expected, default is true)
   -u,--logLevel             choose the minimum log level for logging
                             (DEBUG, INFO, WARN, ERROR, FATAL_ERROR) for startup logging
   -v,--version              print the version information and exit
   -w,--workDir              use given dir as working directory
   -x,--xconf                specify a file containing XML configuration
                             details for the command line interface
  Note: the context directory defaults to './webapp'
          <p>For details of the meaning of each specific parameter, see the <link
          <s3 title="Specifying Targets">
            <p>The command line parameter method does not have access to all of Cocoon's
URI handling features. However,
               it is possible to specify multiple URIs to be crawled, all of which will be
written to the same destination,
               and that destination (specified by the <code>-d</code> or <code>--destDir</code>
option, may be a file URI
               or any other protocol for which a ModifiableSource exists (e.g. FTP).</p>
          <s3 title="URI Files">
            <p>A URI file offers a simple way to specify multiple URIs. The file is
treated as one URI per line.</p>
          <s3 title="Broken Link Handling">
            <p>If a broken link file is specified, all broken links will be written
to this file, in text format,
               one URI per line.</p>
  1.1                  cocoon-2.1/src/documentation/xdocs/userdocs/offline/index.xml
  Index: index.xml
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../../dtd/document-v10.dtd">
      <title>Offline Page Generation</title>
      <type>Technical document</type>
      <authors><person name="Upayavira" email=""/>
      <abstract>This document explains the basic concepts of offline page generation
with Apache Cocoon.</abstract>
      <s1 title="Overview">
        <p>Cocoon can generate static, 'offline' versions of web pages or web sites,
as well
           as sites served dynamically. This document covers the concepts involved in offline
           page and site generation.
      <s1 title="Offline Page Generation">
        <p>Cocoon allows static versions of Cocoon web sites to be created.</p>
        <p>At present, this can be done in three ways:</p>
          <li><link href="cli.html">Command Line Interface</link></li>
          <li><link href="ant.html">Using Ant</link></li>
          <li><link href="bean.html">Cocoon Bean</link></li>
        <p>This document explains the general concepts that are shared by all of these
           The specific details for each method are explained on a separate page.</p>
        <p>Cocoon, when generating pages offline, can follow links in a page (whether
that page
           is HTML, PDF or anything else), and can rewrite URIs to create filenames by checking

           the mime type of the generated page. All links to pages who's URIs change are changed
      <s1 title="Configuration">
        <p>To use Cocoon in its offline mode, a servlet container (e.g. Tomcat or Jetty)
is not
           needed. Cocoon can generate an offline site directly using the information available
           in the Cocoon <code>webapp</code> folder.</p>
        <p>Having said this, many choose to have a servlet container available locally
for use
           whilst debugging, as this can speed up the development process significantly.</p>
        <s2 title="Directories and Files">
          <p>As all the information Cocoon needs to generate a site is stored in the
             webapp directory, we need to tell it where to find it, and where to find various

             other files and directories. These are:</p>
            <li>Context directory (the Cocoon Webapp directory)</li>
            <li>Configuration File (usually <code>${COCOON_WEBAPP}/WEB-INF/cocoon.xconf</code>)</li>
            <li>Work Directory (used by Cocoon to store temporary files, this can be
anywhere of your choosing)</li>
        <s2 title="Logging">
          <p>There are three options that need to be specified in relation to logging.
These are:</p>
            <li>Log Kit (the logging configuration file, usually <code>${COCOON_WEBAPP}/WEB-INF/logkit.xconf</code>)</li>
            <li>Logger (a category used for logging, as configured in the configuration
            <li>Log Level (a logging level, either DEBUG, INFO WARN, ERROR or FATAL_ERROR.
Relates specifically to logging
                at startup, after which log kit configuration takes over)</li>
        <s2 title="Other Configuration Options">
          <p>In online mode, a User agent string tells Cocoon what browser is being
used to access a page. The user agent
             can be configured manually for offline generation.</p>
          <p>In online mode, an accept string is provided by a browser, telling the
browser what types of content it 
             is capable of accepting. This will be a comma separated list of mime types. In
offline mode, an accept
             string can also be specified.</p>
          <p>As Cocoon based sites can change the content they generate based upon the
user agent string and the accepts 
             string, it can be necessary to specify them in order to have the correct content
          <p>In order to generate sites that make use of databases and database connections,
it is necessary to load
             JDBC classes at startup. Cocoon allows for this.</p>
          <p>When, in offline mode, Cocoon generates a page ending in a <code>/</code>,
the resultant file cannot be 
             written to a filesystem as its name would refer specifically to a directory.
Therefore, the user can
             specify a default filename which will be appended to the page's URI before saving
to disc.</p>         
      <s1 title="URIs and Targets">
        <s2 title="SourceURIs">
          <p>A source URI (which may also have a source prefix prepended) is the part
of the URI that is given
             to Cocoon for processing. So, for example, if you access a page with: 
             <code>http://localhost:8080/cocoon/site/page.html</code> then the
source URI would be 
        <s2 title="Destinations and Modifiable Sources">
          <p>Most of the time, when generating pages, the generated pages will be simply
written to disk.</p>
          <p>However, this is not the only option. Generated pages can be written anywhere
for which a 
             <code>ModifiableSource</code> exists. So, for example, it is possible
to generate a site and 
             have the pages written directly to a web server using FTP, by making use of the
        <s2 title="Target Types">
          <p>When generating a page, Cocoon needs to know how to decide upon the URI
of the generated page. 
             This process could be described as 'URI arithmetic'.</p>
          <p>Source and destination URIs are made up of the following elements:</p>
            <li>Source Prefix: Part of a source URI used to request a page but excluded
from the destination 
            <li>Source URI: Part of a source URI that is used when calculating the destination
            <li>Destination URI: The base URI for a destination</li>
            <li>Type: The method used for merging the above elements (can be append,
replace or 
          <note>When combining elements to make a URI, it is the user's responsibility
to include directory
                separators. For example, <code>foo</code> with <code>bar</code>
appended will be 
                <code>foobar</code>, whereas <code>foo/</code> with
<code>bar</code> appended will be
          <s3 title="Appending">
            <p>Here, when calculating the destination URI, the source prefix is ignored,
and the destination
               URI is calculated by appending the source URI to the end of the destination
URI. For example,
               with the following values:</p>
            <p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>,
destination URI:
            <p>A request will be made to Cocoon for a page at: <code>site/page.html</code>.
This will be
               saved as <code>pages/page.html</code>.</p>
          <s3 title="Replacing">
            <p>Here, when calculating the destination URI, the source prefix and the
source URI are 
               ignored, and the destination URI is used as is. This is useful when you wish
to save the
               generated page with a filename that bears no relationship to the source URI.
For example,
               with the following values:</p>
            <p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>,
destination URI:
            <p>A request will be made to Cocoon for a page at: <code>site/page.html</code>.
This will be
               saved as <code>pages/simple.html</code>.</p>
            <note>Given the nature of this target type, it inherently cannot be used
when following links 
               (otherwise all pages will be written on top of each other).</note>
          <s3 title="Inserting">
            <p>Here, when calculating the destination URI, the source prefix is ignored,
and the source URI
               is inserted into the destination URI at the point marked by an asterisk (*).
This is intended
               for use with complex protocols where the source URI does not appear at the
end of the 
               destination URI.</p>
        <s2 title="Mime Type Checking">
          <p>Cocoon can optionally test the mime type for a page, and, if the mime type
doesn't match the page's
             extension, amend the destination URI to include the correct extension. This will
ensure that pages
             will load correctly when served by a static web server.</p>
          <p>When Cocoon amends a destination URI, it also amends URIs for links in
those pages, so that links 
             will still work when a site has been crawled.</p>
          <note>This feature substantially slows down page generation, as each page
must be generated three times,
                (once to find links, once to find its mime-type and once to collect the actual
content. This 
                can be avoided by ensuring that all URIs in the site are correct and do not
need amending, in which
                case it is only necessary to generate a page once.</note>
      <s1 title="Following Links and Site Crawling">
        <p>Cocoon can be configured to either follow, or ignore, links in pages that
it generates. It has two methods
        of gathering links, 'link view' and 'link gathering'.</p>
        <s2 title="Link View Crawling">
          <p>With link view crawling, Cocoon gets the links by generating the 'link
view' for a page. Using link view
             gives a significant degree of configurability in terms of which links are gathered,
as it is possible to
             insert a transformer into the view to select out links that should not be followed.</p>
          <p>The disadvantage with link view crawling is that each page must be generated
twice, which doubles page 
             generation time.</p>
          <p>Link view is usually configured in the root sitemap with:</p>
    <map:view from-position="last" name="links">
     <map:serialize type="links"/>
          <p>If you have this in your root sitemap, you do not need it in your sub-sitemaps.
However, you may choose
             to override it with one that carries our further processing - for example, with
an XSLT transformer that
             removes links that should not be crawled.</p>            
          <p>See <link href="../concepts/views.html">views</link> for more
on views. </p>
          <p>You can see the link view yourself by appending <code>?cocoon-view=links</code>
to the page's URI.</p>
        <s2 title="Link Gathering Crawling">
          <p>With link gathering crawling, links are gathered from the SAX stream right
before the serializer. All 
             <code>src</code>, <code>href</code> and <code>xlink:href</code>
attributes are taken to be links, and are
             therefore followed.</p>
          <p>The benefit of link gathering crawling is that pages do not need to be
generated twice. However, one looses
             the ability to configure which links should be followed that exists with link
view crawling.</p>
      <s1 title="Broken Links">
        <p>When a page cannot be found at a URI that has either been specified, or has
been found as a link in another
           page, it is considered 'broken'.</p>
        <p>Exactly what is done when a broken link is found depends upon the method
used to evoke 
           Cocoon. See related pages for specific details.</p>
      <s1 title="Precompiling XSPs">
        <p>When used offline, Cocoon can precompile XSP pages. If no URIs are specified,
it will scan all directories
           within the context directory looking for XSP files, each of which will be compiled.
If URIs are specified,
           all links will be followed looking for pages that make use of XSP, compiling those
XSP pages as they are

View raw message