lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gsing...@apache.org
Subject svn commit: r479465 [1/4] - in /lucene/java/trunk: docs/ docs/images/ docs/lucene-sandbox/ docs/styles/ src/site/ src/site/src/ src/site/src/documentation/ src/site/src/documentation/classes/ src/site/src/documentation/conf/ src/site/src/documentation/...
Date Mon, 27 Nov 2006 00:00:49 GMT
Author: gsingers
Date: Sun Nov 26 16:00:46 2006
New Revision: 479465

URL: http://svn.apache.org/viewvc?view=rev&rev=479465
Log:
Updated the website to new Forrest based site, see Issue 707, part one of commits

Added:
    lucene/java/trunk/src/site/   (with props)
    lucene/java/trunk/src/site/forrest.properties   (with props)
    lucene/java/trunk/src/site/src/
    lucene/java/trunk/src/site/src/documentation/
    lucene/java/trunk/src/site/src/documentation/classes/
    lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties   (with props)
    lucene/java/trunk/src/site/src/documentation/conf/
    lucene/java/trunk/src/site/src/documentation/conf/cli.xconf
    lucene/java/trunk/src/site/src/documentation/content/
    lucene/java/trunk/src/site/src/documentation/content/.htaccess
    lucene/java/trunk/src/site/src/documentation/content/xdocs/
    lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/asf-logo.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_architecture.jpg   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_crawling-process.jpg   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lia_3d.jpg   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_100.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_150.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_200.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_250.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_300.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_100.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_150.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_200.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_250.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_300.gif   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/index.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/lucene-sandbox/
    lucene/java/trunk/src/site/src/documentation/content/xdocs/lucene-sandbox/index.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/mailinglists.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/queryparsersyntax.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/releases.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/resources.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/scoring.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/site.xml   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/systemproperties.xml
    lucene/java/trunk/src/site/src/documentation/content/xdocs/tabs.xml   (with props)
    lucene/java/trunk/src/site/src/documentation/content/xdocs/whoweare.xml
    lucene/java/trunk/src/site/src/documentation/sitemap.xmap   (with props)
    lucene/java/trunk/src/site/src/documentation/skinconf.xml   (with props)
Removed:
    lucene/java/trunk/docs/benchmarks.html
    lucene/java/trunk/docs/benchmarktemplate.xml
    lucene/java/trunk/docs/contributions.html
    lucene/java/trunk/docs/demo.html
    lucene/java/trunk/docs/demo2.html
    lucene/java/trunk/docs/demo3.html
    lucene/java/trunk/docs/demo4.html
    lucene/java/trunk/docs/features.html
    lucene/java/trunk/docs/fileformats.html
    lucene/java/trunk/docs/gettingstarted.html
    lucene/java/trunk/docs/images/
    lucene/java/trunk/docs/index.html
    lucene/java/trunk/docs/lucene-sandbox/
    lucene/java/trunk/docs/mailinglists.html
    lucene/java/trunk/docs/queryparsersyntax.html
    lucene/java/trunk/docs/resources.html
    lucene/java/trunk/docs/scoring.html
    lucene/java/trunk/docs/styles/
    lucene/java/trunk/docs/systemproperties.html
    lucene/java/trunk/docs/whoweare.html
    lucene/java/trunk/xdocs/

Propchange: lucene/java/trunk/src/site/
------------------------------------------------------------------------------
--- svn:ignore (added)
+++ svn:ignore Sun Nov 26 16:00:46 2006
@@ -0,0 +1 @@
+build

Added: lucene/java/trunk/src/site/forrest.properties
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/forrest.properties?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/forrest.properties (added)
+++ lucene/java/trunk/src/site/forrest.properties Sun Nov 26 16:00:46 2006
@@ -0,0 +1,130 @@
+# Copyright 2002-2005 The Apache Software Foundation or its licensors,
+# as applicable.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+##############
+# Properties used by forrest.build.xml for building the website
+# These are the defaults, un-comment them only if you need to change them.
+##############
+
+# Prints out a summary of Forrest settings for this project
+#forrest.echo=true
+
+# Project name (used to name .war file)
+#project.name=my-project
+
+# Specifies name of Forrest skin to use
+# See list at http://forrest.apache.org/docs/skins.html
+#project.skin=pelt
+
+# Descriptors for plugins and skins
+# comma separated list, file:// is supported
+#forrest.skins.descriptors=http://forrest.apache.org/skins/skins.xml,file:///c:/myskins/skins.xml
+#forrest.plugins.descriptors=http://forrest.apache.org/plugins/plugins.xml,http://forrest.apache.org/plugins/whiteboard-plugins.xml
+
+##############
+# behavioural properties
+#project.menu-scheme=tab_attributes
+#project.menu-scheme=directories
+
+##############
+# layout properties
+
+# Properties that can be set to override the default locations
+#
+# Parent properties must be set. This usually means uncommenting
+# project.content-dir if any other property using it is uncommented
+
+#project.status=status.xml
+#project.content-dir=src/documentation
+#project.raw-content-dir=${project.content-dir}/content
+#project.conf-dir=${project.content-dir}/conf
+#project.sitemap-dir=${project.content-dir}
+#project.xdocs-dir=${project.content-dir}/content/xdocs
+#project.resources-dir=${project.content-dir}/resources
+#project.stylesheets-dir=${project.resources-dir}/stylesheets
+#project.images-dir=${project.resources-dir}/images
+#project.schema-dir=${project.resources-dir}/schema
+#project.skins-dir=${project.content-dir}/skins
+#project.skinconf=${project.content-dir}/skinconf.xml
+#project.lib-dir=${project.content-dir}/lib
+#project.classes-dir=${project.content-dir}/classes
+#project.translations-dir=${project.content-dir}/translations
+project.configfile=${project.home}/src/documentation/conf/cli.xconf
+
+##############
+# validation properties
+
+# This set of properties determine if validation is performed
+# Values are inherited unless overridden.
+# e.g. if forrest.validate=false then all others are false unless set to true.
+#forrest.validate=true
+#forrest.validate.xdocs=${forrest.validate}
+#forrest.validate.skinconf=${forrest.validate}
+#forrest.validate.sitemap=${forrest.validate}
+#forrest.validate.stylesheets=${forrest.validate}
+#forrest.validate.skins=${forrest.validate}
+#forrest.validate.skins.stylesheets=${forrest.validate.skins}
+
+# *.failonerror=(true|false) - stop when an XML file is invalid
+#forrest.validate.failonerror=true
+
+# *.excludes=(pattern) - comma-separated list of path patterns to not validate
+# e.g.
+#forrest.validate.xdocs.excludes=samples/subdir/**, samples/faq.xml
+#forrest.validate.xdocs.excludes=
+
+
+##############
+# General Forrest properties
+
+# The URL to start crawling from
+#project.start-uri=linkmap.html
+
+# Set logging level for messages printed to the console
+# (DEBUG, INFO, WARN, ERROR, FATAL_ERROR)
+#project.debuglevel=ERROR
+
+# Max memory to allocate to Java
+#forrest.maxmemory=64m
+
+# Any other arguments to pass to the JVM. For example, to run on an X-less
+# server, set to -Djava.awt.headless=true
+#forrest.jvmargs=
+
+# The bugtracking URL - the issue number will be appended
+#project.bugtracking-url=http://issues.apache.org/bugzilla/show_bug.cgi?id=
+#project.bugtracking-url=http://issues.apache.org/jira/browse/
+
+# The issues list as rss
+#project.issues-rss-url=
+
+#I18n Property. Based on the locale request for the browser.
+#If you want to use it for static site then modify the JVM system.language
+# and run once per language
+#project.i18n=true
+
+# The names of plugins that are required to build the project
+# comma separated list (no spaces)
+# You can request a specific version by appending "-VERSION" to the end of
+# the plugin name. If you exclude a version number the latest released version
+# will be used, however, be aware that this may be a development version. In
+# a production environment it is recomended that you specify a known working 
+# version.
+# Run "forrest available-plugins" for a list of plug-ins currently available
+project.required.plugins=org.apache.forrest.plugin.output.pdf
+
+# Proxy configuration
+# proxy.host=
+# proxy.port=

Propchange: lucene/java/trunk/src/site/forrest.properties
------------------------------------------------------------------------------
    svn:executable = *

Added: lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties (added)
+++ lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties Sun Nov 26 16:00:46 2006
@@ -0,0 +1,57 @@
+# Copyright 2002-2005 The Apache Software Foundation or its licensors,
+# as applicable.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#=======================================================================
+# CatalogManager.properties for Catalog Entity Resolver.
+#
+# This is the default properties file for your project.
+# This facilitates local configuration of application-specific catalogs.
+# If you have defined any local catalogs, then they will be loaded
+# before Forrest's core catalogs.
+#
+# See the Apache Forrest documentation:
+# http://forrest.apache.org/docs/your-project.html
+# http://forrest.apache.org/docs/validation.html
+
+# verbosity:
+# The level of messages for status/debug (messages go to standard output).
+# The setting here is for your own local catalogs.
+# The verbosity of Forrest's core catalogs is controlled via
+#  main/webapp/WEB-INF/cocoon.xconf
+#
+# The following messages are provided ...
+#  0 = none
+#  1 = ? (... not sure yet)
+#  2 = 1+, Loading catalog, Resolved public, Resolved system
+#  3 = 2+, Catalog does not exist, resolvePublic, resolveSystem
+#  10 = 3+, List all catalog entries when loading a catalog
+#    (Cocoon also logs the "Resolved public" messages.)
+verbosity=1
+
+# catalogs ... list of additional catalogs to load
+#  (Note that Apache Forrest will automatically load its own default catalog
+#  from main/webapp/resources/schema/catalog.xcat)
+# Use either full pathnames or relative pathnames.
+# pathname separator is always semi-colon (;) regardless of operating system
+# directory separator is always slash (/) regardless of operating system
+catalogs=../resources/schema/catalog.xcat
+
+# relative-catalogs
+# If false, relative catalog URIs are made absolute with respect to the
+# base URI of the CatalogManager.properties file. This setting only 
+# applies to catalog URIs obtained from the catalogs property in the
+# CatalogManager.properties file
+# Example: relative-catalogs=[yes|no]
+relative-catalogs=no

Propchange: lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties
------------------------------------------------------------------------------
    svn:executable = *

Added: lucene/java/trunk/src/site/src/documentation/conf/cli.xconf
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/conf/cli.xconf?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/conf/cli.xconf (added)
+++ lucene/java/trunk/src/site/src/documentation/conf/cli.xconf Sun Nov 26 16:00:46 2006
@@ -0,0 +1,321 @@
+<?xml version="1.0"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation or its licensors,
+  as applicable.
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!--+
+    |  This is the Apache Cocoon command line configuration file.
+    |  Here you give the command line interface details of where
+    |  to find various aspects of your Cocoon installation.
+    |
+    |  If you wish, you can also use this file to specify the URIs
+    |  that you wish to generate.
+    |
+    |  The current configuration information in this file is for
+    |  building the Cocoon documentation. Therefore, all links here
+    |  are relative to the build context dir, which, in the build.xml
+    |  file, is set to ${build.context}
+    |
+    |  Options:
+    |    verbose:            increase amount of information presented
+    |                        to standard output (default: false)
+    |    follow-links:       whether linked pages should also be
+    |                        generated (default: true)
+    |    precompile-only:    precompile sitemaps and XSP pages, but
+    |                        do not generate any pages (default: false)
+    |    confirm-extensions: check the mime type for the generated page
+    |                        and adjust filename and links extensions
+    |                        to match the mime type
+    |                        (e.g. text/html->.html)
+    |
+    |  Note: Whilst using an xconf file to configure the Cocoon
+    |        Command Line gives access to more features, the use of
+    |        command line parameters is more stable, as there are
+    |        currently plans to improve the xconf format to allow
+    |        greater flexibility. If you require a stable and
+    |        consistent method for accessing the CLI, it is recommended
+    |        that you use the command line parameters to configure
+    |        the CLI. See documentation at:
+    |        http://cocoon.apache.org/2.1/userdocs/offline/
+    |        http://wiki.apache.org/cocoon/CommandLine
+    |
+    +-->
+
+<cocoon verbose="true"
+        follow-links="true"
+        precompile-only="false"
+        confirm-extensions="false">
+
+   <!--+
+       |  The context directory is usually the webapp directory
+       |  containing the sitemap.xmap file.
+       |
+       |  The config file is the cocoon.xconf file.
+       |
+       |  The work directory is used by Cocoon to store temporary
+       |  files and cache files.
+       |
+       |  The destination directory is where generated pages will
+       |  be written (assuming the 'simple' mapper is used, see
+       |  below)
+       +-->
+   <context-dir>.</context-dir>
+   <config-file>WEB-INF/cocoon.xconf</config-file>
+   <work-dir>../tmp/cocoon-work</work-dir>
+   <dest-dir>../site</dest-dir>
+
+   <!--+
+       |  A checksum file can be used to store checksums for pages
+       |  as they are generated. When the site is next generated,
+       |  files will not be written if their checksum has not changed.
+       |  This means that it will be easier to detect which files
+       |  need to be uploaded to a server, using the timestamp.
+       +-->
+   <!--   <checksums-uri>build/work/checksums</checksums-uri>-->
+
+   <!--+
+       | Broken link reporting options:
+       |   Report into a text file, one link per line:
+       |     <broken-links type="text" report="filename"/>
+       |   Report into an XML file:
+       |     <broken-links type="xml" report="filename"/>
+       |   Ignore broken links (default):
+       |     <broken-links type="none"/>
+       |
+       |   Two attributes to this node specify whether a page should
+       |   be generated when an error has occured. 'generate' specifies
+       |   whether a page should be generated (default: true) and
+       |   extension specifies an extension that should be appended
+       |   to the generated page's filename (default: none)
+       |
+       |   Using this, a quick scan through the destination directory
+       |   will show broken links, by their filename extension.
+       +-->
+   <broken-links type="xml"
+                 file="../brokenlinks.xml"
+                 generate="false"
+                 extension=".error"
+                 show-referrers="true"/>
+
+   <!--+
+       |  Load classes at startup. This is necessary for generating
+       |  from sites that use SQL databases and JDBC.
+       |  The <load-class> element can be repeated if multiple classes
+       |  are needed.
+       +-->
+   <!--
+   <load-class>org.firebirdsql.jdbc.Driver</load-class>
+   -->
+
+   <!--+
+       |  Configures logging.
+       |  The 'log-kit' parameter specifies the location of the log kit
+       |  configuration file (usually called logkit.xconf.
+       |
+       |  Logger specifies the logging category (for all logging prior
+       |  to other Cocoon logging categories taking over)
+       |
+       |  Available log levels are:
+       |    DEBUG:        prints all level of log messages.
+       |    INFO:         prints all level of log messages except DEBUG
+       |                  ones.
+       |    WARN:         prints all level of log messages except DEBUG
+       |                  and INFO ones.
+       |    ERROR:        prints all level of log messages except DEBUG,
+       |                  INFO and WARN ones.
+       |    FATAL_ERROR:  prints only log messages of this level
+       +-->
+   <!-- <logging log-kit="WEB-INF/logkit.xconf" logger="cli" level="ERROR" /> -->
+
+   <!--+
+       |  Specifies the filename to be appended to URIs that
+       |  refer to a directory (i.e. end with a forward slash).
+       +-->
+   <default-filename>index.html</default-filename>
+
+   <!--+
+       |  Specifies a user agent string to the sitemap when
+       |  generating the site.
+       |
+       |  A generic term for a web browser is "user agent". Any
+       |  user agent, when connecting to a web server, will provide
+       |  a string to identify itself (e.g. as Internet Explorer or
+       |  Mozilla). It is possible to have Cocoon serve different
+       |  content depending upon the user agent string provided by
+       |  the browser. If your site does this, then you may want to
+       |  use this <user-agent> entry to provide a 'fake' user agent
+       |  to Cocoon, so that it generates the correct version of your
+       |  site.
+       |
+       |  For most sites, this can be ignored.
+       +-->
+   <!--
+   <user-agent>Cocoon Command Line Environment 2.1</user-agent>
+   -->
+
+   <!--+
+       |  Specifies an accept string to the sitemap when generating
+       |  the site.
+       |  User agents can specify to an HTTP server what types of content
+       |  (by mime-type) they are able to receive. E.g. a browser may be
+       |  able to handle jpegs, but not pngs. The HTTP accept header
+       |  allows the server to take the browser's capabilities into account,
+       |  and only send back content that it can handle.
+       |
+       |  For most sites, this can be ignored.
+       +-->
+
+   <accept>*/*</accept>
+
+   <!--+
+       | Specifies which URIs should be included or excluded, according
+       | to wildcard patterns.
+       |
+       | These includes/excludes are only relevant when you are following
+       | links. A link URI must match an include pattern (if one is given)
+       | and not match an exclude pattern, if it is to be followed by
+       | Cocoon. It can be useful, for example, where there are links in
+       | your site to pages that are not generated by Cocoon, such as
+       | references to api-documentation.
+       |
+       | By default, all URIs are included. If both include and exclude
+       | patterns are specified, a URI is first checked against the
+       | include patterns, and then against the exclude patterns.
+       |
+       | Multiple patterns can be given, using muliple include or exclude
+       | nodes.
+       |
+       | The order of the elements is not significant, as only the first
+       | successful match of each category is used.
+       |
+       | Currently, only the complete source URI can be matched (including
+       | any URI prefix). Future plans include destination URI matching
+       | and regexp matching. If you have requirements for these, contact
+       | dev@cocoon.apache.org.
+       +-->
+
+   <exclude pattern="**/"/>
+   <exclude pattern="**apidocs**"/>
+   <exclude pattern="api/**"/>
+   <exclude pattern="**benchmarktemplate.xml"/>
+
+<!--
+  This is a workaround for FOR-284 "link rewriting broken when
+  linking to xml source views which contain site: links".
+  See the explanation there and in declare-broken-site-links.xsl
+-->
+   <exclude pattern="site:**"/>
+   <exclude pattern="ext:**"/>
+   <exclude pattern="**/site:**"/>
+   <exclude pattern="**/ext:**"/>
+
+   <!-- Exclude tokens used in URLs to ASF mirrors (interpreted by a CGI) -->
+   <exclude pattern="[preferred]/**"/>
+   <exclude pattern="[location]"/>
+
+   <!--   <include-links extension=".html"/>-->
+
+   <!--+
+       |  <uri> nodes specify the URIs that should be generated, and
+       |  where required, what should be done with the generated pages.
+       |  They describe the way the URI of the generated file is created
+       |  from the source page's URI. There are three ways that a generated
+       |  file URI can be created: append, replace and insert.
+       |
+       |  The "type" attribute specifies one of (append|replace|insert):
+       |
+       |  append:
+       |  Append the generated page's URI to the end of the source URI:
+       |
+       |   <uri type="append" src-prefix="documents/" src="index.html"
+       |   dest="build/dest/"/>
+       |
+       |  This means that
+       |   (1) the "documents/index.html" page is generated
+       |   (2) the file will be written to "build/dest/documents/index.html"
+       |
+       |  replace:
+       |  Completely ignore the generated page's URI - just
+       |  use the destination URI:
+       |
+       |   <uri type="replace" src-prefix="documents/" src="index.html"
+       |   dest="build/dest/docs.html"/>
+       |
+       |  This means that
+       |   (1) the "documents/index.html" page is generated
+       |   (2) the result is written to "build/dest/docs.html"
+       |   (3) this works only for "single" pages - and not when links
+       |       are followed
+       |
+       |  insert:
+       |  Insert generated page's URI into the destination
+       |  URI at the point marked with a * (example uses fictional
+       |  zip protocol)
+       |
+       |   <uri type="insert" src-prefix="documents/" src="index.html"
+       |   dest="zip://*.zip/page.html"/>
+       |
+       |  This means that
+       |   (1)
+       |
+       |  In any of these scenarios, if the dest attribute is omitted,
+       |  the value provided globally using the <dest-dir> node will
+       |  be used instead.
+       +-->
+   <!--
+   <uri type="replace"
+        src-prefix="samples/"
+        src="hello-world/hello.html"
+        dest="build/dest/hello-world.html"/>
+   -->
+
+   <!--+
+       | <uri> nodes can be grouped together in a <uris> node. This
+       | enables a group of URIs to share properties. The following
+       | properties can be set for a group of URIs:
+       |   * follow-links:       should pages be crawled for links
+       |   * confirm-extensions: should file extensions be checked
+       |                         for the correct mime type
+       |   * src-prefix:         all source URIs should be
+       |                         pre-pended with this prefix before
+       |                         generation. The prefix is not
+       |                         included when calculating the
+       |                         destination URI
+       |   * dest:               the base destination URI to be
+       |                         shared by all pages in this group
+       |   * type:               the method to be used to calculate
+       |                         the destination URI. See above
+       |                         section on <uri> node for details.
+       |
+       | Each <uris> node can have a name attribute. When a name
+       | attribute has been specified, the -n switch on the command
+       | line can be used to tell Cocoon to only process the URIs
+       | within this URI group. When no -n switch is given, all
+       | <uris> nodes are processed. Thus, one xconf file can be
+       | used to manage multiple sites.
+       +-->
+   <!--
+   <uris name="mirrors" follow-links="false">
+     <uri type="append" src="mirrors.html"/>
+   </uris>
+   -->
+
+   <!--+
+       |  File containing URIs (plain text, one per line).
+       +-->
+   <!--
+   <uri-file>uris.txt</uri-file>
+   -->
+</cocoon>

Added: lucene/java/trunk/src/site/src/documentation/content/.htaccess
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/.htaccess?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/.htaccess (added)
+++ lucene/java/trunk/src/site/src/documentation/content/.htaccess Sun Nov 26 16:00:46 2006
@@ -0,0 +1,3 @@
+#Forrest generates UTF-8 by default, but these httpd servers are
+#ignoring the meta http-equiv charset tags
+AddDefaultCharset off

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,525 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>Apache Lucene - Resources - Performance Benchmarks</title>
+	</header>
+    <properties>
+        <author email="kelvint@apache.org">Kelvin Tan</author>
+
+    </properties>
+    <body>
+
+        <section id="Performance Benchmarks"><title>Performance Benchmarks</title>
+            <p>
+                The purpose of these user-submitted performance figures is to
+                give current and potential users of Lucene a sense
+                of how well Lucene scales. If the requirements for an upcoming
+                project is similar to an existing benchmark, you
+                will also have something to work with when designing the system
+                architecture for the application.
+            </p>
+            <p>
+                If you've conducted performance tests with Lucene, we'd
+                appreciate if you can submit these figures for display
+                on this page. Post these figures to the lucene-user mailing list
+                using this
+                <a href="benchmarktemplate.xml">template</a>.
+            </p>
+        </section>
+
+        <section id="Benchmark Variables"><title>Benchmark Variables</title>
+            <p>
+                <ul>
+                    <p>
+                        <b>Hardware Environment</b><br/>
+                        <li><i>Dedicated machine for indexing</i>: Self-explanatory
+                            (yes/no)</li>
+                        <li><i>CPU</i>: Self-explanatory (Type, Speed and Quantity)</li>
+                        <li><i>RAM</i>: Self-explanatory</li>
+                        <li><i>Drive configuration</i>: Self-explanatory (IDE, SCSI,
+                            RAID-1, RAID-5)</li>
+                    </p>
+                    <p>
+                        <b>Software environment</b><br/>
+                        <li><i>Lucene Version</i>: Self-explanatory</li>
+                        <li><i>Java Version</i>: Version of Java SDK/JRE that is run
+                        </li>
+                        <li><i>Java VM</i>: Server/client VM, Sun VM/JRockIt</li>
+                        <li><i>OS Version</i>: Self-explanatory</li>
+                        <li><i>Location of index</i>: Is the index stored in filesystem
+                            or database? Is it on the same server(local) or
+                            over the network?</li>
+                    </p>
+                    <p>
+                        <b>Lucene indexing variables</b><br/>
+                        <li><i>Number of source documents</i>: Number of documents being
+                            indexed</li>
+                        <li><i>Total filesize of source documents</i>:
+                            Self-explanatory</li>
+                        <li><i>Average filesize of source documents</i>:
+                            Self-explanatory</li>
+                        <li><i>Source documents storage location</i>: Where are the
+                            documents being indexed located?
+                            Filesystem, DB, http, etc.</li>
+                        <li><i>File type of source documents</i>: Types of files being
+                            indexed, e.g. HTML files, XML files, PDF files, etc.</li>
+                        <li><i>Parser(s) used, if any</i>: Parsers used for parsing the
+                            various files for indexing,
+                            e.g. XML parser, HTML parser, etc.</li>
+                        <li><i>Analyzer(s) used</i>: Type of Lucene analyzer used</li>
+                        <li><i>Number of fields per document</i>: Number of Fields each
+                            Document contains</li>
+                        <li><i>Type of fields</i>: Type of each field</li>
+                        <li><i>Index persistence</i>: Where the index is stored, e.g.
+                            FSDirectory, SqlDirectory, etc.</li>
+                    </p>
+                    <p>
+                        <b>Figures</b><br/>
+                        <li><i>Time taken (in ms/s as an average of at least 3 indexing
+                                runs)</i>: Time taken to index all files</li>
+                        <li><i>Time taken / 1000 docs indexed</i>: Time taken to index
+                            1000 files</li>
+                        <li><i>Memory consumption</i>: Self-explanatory</li>
+                        <li><i>Query speed</i>: average time a query takes, type
+                            of queries (e.g. simple one-term query, phrase query),
+                            not measuring any overhead outside Lucene</li>
+                    </p>
+                    <p>
+                        <b>Notes</b><br/>
+                        <li><i>Notes</i>: Any comments which don't belong in the above,
+                            special tuning/strategies, etc.</li>
+                    </p>
+                </ul>
+            </p>
+        </section>
+
+        <section id="User-submitted Benchmarks"><title>User-submitted Benchmarks</title>
+            <p>
+                These benchmarks have been kindly submitted by Lucene users for
+                reference purposes.
+            </p>
+            <p><b>We make NO guarantees regarding their accuracy or
+                    validity.</b>
+            </p>
+            <p>We strongly recommend you conduct your own
+                performance benchmarks before deciding on a particular
+                hardware/software setup (and hopefully submit
+                these figures to us).
+            </p>
+
+            <section id="Hamish Carpenter's benchmarks"><title>Hamish Carpenter's benchmarks</title>
+                <ul>
+                    <p>
+                        <b>Hardware Environment</b><br/>
+                        <li><i>Dedicated machine for indexing</i>: yes</li>
+                        <li><i>CPU</i>: Intel x86 P4 1.5Ghz</li>
+                        <li><i>RAM</i>: 512 DDR</li>
+                        <li><i>Drive configuration</i>: IDE 7200rpm Raid-1</li>
+                    </p>
+                    <p>
+                        <b>Software environment</b><br/>
+                        <li><i>Lucene Version</i>: 1.3</li>
+                        <li><i>Java Version</i>: 1.3.1 IBM JITC Enabled</li>
+                        <li><i>Java VM</i>: </li>
+                        <li><i>OS Version</i>: Debian Linux 2.4.18-686</li>
+                        <li><i>Location of index</i>: local</li>
+                    </p>
+                    <p>
+                        <b>Lucene indexing variables</b><br/>
+                        <li><i>Number of source documents</i>: Random generator. Set
+                            to make 1M documents
+                            in 2x500,000 batches.</li>
+                        <li><i>Total filesize of source documents</i>: > 1GB if
+                            stored</li>
+                        <li><i>Average filesize of source documents</i>: 1KB</li>
+                        <li><i>Source documents storage location</i>: Filesystem</li>
+                        <li><i>File type of source documents</i>: Generated</li>
+                        <li><i>Parser(s) used, if any</i>: </li>
+                        <li><i>Analyzer(s) used</i>: Default</li>
+                        <li><i>Number of fields per document</i>: 11</li>
+                        <li><i>Type of fields</i>: 1 date, 1 id, 9 text</li>
+                        <li><i>Index persistence</i>: FSDirectory</li>
+                    </p>
+                    <p>
+                        <b>Figures</b><br/>
+                        <li><i>Time taken (in ms/s as an average of at least 3
+                                indexing runs)</i>: </li>
+                        <li><i>Time taken / 1000 docs indexed</i>: 49 seconds</li>
+                        <li><i>Memory consumption</i>:</li>
+                    </p>
+                    <p>
+                        <b>Notes</b><br/>
+                            <p>
+                                A windows client ran a random document generator which
+                                created
+                                documents based on some arrays of values and an excerpt
+                                (approx 1kb)
+                                from a text file of the bible (King James version).<br/>
+                                These were submitted via a socket connection (open throughout
+                                indexing process).<br/>
+                                The index writer was not closed between index calls.<br/>
+                                This created a 400Mb index in 23 files (after
+                                optimization).<br/>
+                            </p>
+                            <p>
+                                <u>Query details</u>:<br/>
+                            </p>
+                            <p>
+                                Set up a threaded class to start x number of simultaneous
+                                threads to
+                                search the above created index.
+                            </p>
+                            <p>
+                                Query:  +Domain:sos +(+((Name:goo*^2.0 Name:plan*^2.0)
+                                (Teaser:goo* Tea
+                                ser:plan*) (Details:goo* Details:plan*)) -Cancel:y)
+                                +DisplayStartDate:[mkwsw2jk0
+                                -mq3dj1uq0] +EndDate:[mq3dj1uq0-ntlxuggw0]
+                            </p>
+                            <p>
+                                This query counted 34000 documents and I limited the returned
+                                documents
+                                to 5.
+                            </p>
+                            <p>
+                                This is using Peter Halacsy's IndexSearcherCache slightly
+                                modified to
+                                be a singleton returned cached searchers for a given
+                                directory. This
+                                solved an initial problem with too many files open and
+                                running out of
+                                linux handles for them.
+                            </p>
+                            <pre>
+                                Threads|Avg Time per query (ms)
+                                1       1009ms
+                                2       2043ms
+                                3       3087ms
+                                4       4045ms
+                                ..        .
+                                ..        .
+                                10      10091ms
+                            </pre>
+                            <p>
+                                I removed the two date range terms from the query and it made
+                                a HUGE
+                                difference in performance. With 4 threads the avg time
+                                dropped to 900ms!
+                            </p>
+                            <p>Other query optimizations made little difference.</p>
+                    </p>
+                </ul>
+                <p>
+                    Hamish can be contacted at hamish at catalyst.net.nz.
+                </p>
+            </section>
+
+            <section id="Justin Greene's benchmarks"><title>Justin Greene's benchmarks</title>
+                <ul>
+                    <p>
+                        <b>Hardware Environment</b><br/>
+                        <li><i>Dedicated machine for indexing</i>: No, but nominal
+                            usage at time of indexing.</li>
+                        <li><i>CPU</i>: Compaq Proliant 1850R/600 2 X pIII 600</li>
+                        <li><i>RAM</i>: 1GB, 256MB allocated to JVM.</li>
+                        <li><i>Drive configuration</i>: RAID 5 on Fibre Channel
+                            Array</li>
+                    </p>
+                    <p>
+                        <b>Software environment</b><br/>
+                        <li><i>Java Version</i>: 1.3.1_06</li>
+                        <li><i>Java VM</i>: </li>
+                        <li><i>OS Version</i>: Winnt 4/Sp6</li>
+                        <li><i>Location of index</i>: local</li>
+                    </p>
+                    <p>
+                        <b>Lucene indexing variables</b><br/>
+                        <li><i>Number of source documents</i>: about 60K</li>
+                        <li><i>Total filesize of source documents</i>: 6.5GB</li>
+                        <li><i>Average filesize of source documents</i>: 100K
+                            (6.5GB/60K documents)</li>
+                        <li><i>Source documents storage location</i>: filesystem on
+                            NTFS</li>
+                        <li><i>File type of source documents</i>: </li>
+                        <li><i>Parser(s) used, if any</i>: Currently the only parser
+                            used is the Quiotix html
+                            parser.</li>
+                        <li><i>Analyzer(s) used</i>: SimpleAnalyzer</li>
+                        <li><i>Number of fields per document</i>: 8</li>
+                        <li><i>Type of fields</i>: All strings, and all are stored
+                            and indexed.</li>
+                        <li><i>Index persistence</i>: FSDirectory</li>
+                    </p>
+                    <p>
+                        <b>Figures</b><br/>
+                        <li><i>Time taken (in ms/s as an average of at least 3
+                                indexing runs)</i>: 1 hour 12 minutes, 1 hour 14 minutes and 1 hour 17
+                            minutes.  Note that the #
+                            and size of documents changes daily.</li>
+                        <li><i>Time taken / 1000 docs indexed</i>: </li>
+                        <li><i>Memory consumption</i>: JVM is given 256MB and uses it
+                            all.</li>
+                    </p>
+                    <p>
+                        <b>Notes</b><br/>
+                            <p>
+                                We have 10 threads reading files from the filesystem and
+                                parsing and
+                                analyzing them and the pushing them onto a queue and a single
+                                thread poping
+                                them from the queue and indexing.  Note that we are indexing
+                                email messages
+                                and are storing the entire plaintext in of the message in the
+                                index.  If the
+                                message contains attachment and we do not have a filter for
+                                the attachment
+                                (ie. we do not do PDFs yet), we discard the data.
+                            </p>
+                    </p>
+                </ul>
+                <p>
+                    Justin can be contacted at tvxh-lw4x at spamex.com.
+                </p>
+            </section>
+
+
+            <section id="Daniel Armbrust's benchmarks"><title>Daniel Armbrust's benchmarks</title>
+                <p>
+                    My disclaimer is that this is a very poor "Benchmark".  It was not done for raw speed,
+                    nor was the total index built in one shot.  The index was created on several different
+                    machines (all with these specs, or very similar), with each machine indexing batches of 500,000 to
+                    1 million documents per batch.  Each of these small indexes was then moved to a
+                    much larger drive, where they were all merged together into a big index.
+                    This process was done manually, over the course of several months, as the sources became available.
+                </p>
+                <ul>
+                    <p>
+                        <b>Hardware Environment</b><br/>
+                        <li><i>Dedicated machine for indexing</i>: no - The machine had moderate to low load.  However, the indexing process was built single
+                            threaded, so it only took advantage of 1 of the processors.  It usually got 100% of this processor.</li>
+                        <li><i>CPU</i>: Sun Ultra 80 4 x 64 bit processors</li>
+                        <li><i>RAM</i>: 4 GB Memory</li>
+                        <li><i>Drive configuration</i>: Ultra-SCSI Wide 10000 RPM 36GB Drive</li>
+                    </p>
+                    <p>
+                        <b>Software environment</b><br/>
+                        <li><i>Lucene Version</i>: 1.2</li>
+                        <li><i>Java Version</i>: 1.3.1</li>
+                        <li><i>Java VM</i>: </li>
+                        <li><i>OS Version</i>: Sun 5.8 (64 bit)</li>
+                        <li><i>Location of index</i>: local</li>
+                    </p>
+                    <p>
+                        <b>Lucene indexing variables</b><br/>
+                        <li><i>Number of source documents</i>: 13,820,517</li>
+                        <li><i>Total filesize of source documents</i>: 87.3 GB</li>
+                        <li><i>Average filesize of source documents</i>: 6.3 KB</li>
+                        <li><i>Source documents storage location</i>: Filesystem</li>
+                        <li><i>File type of source documents</i>: XML</li>
+                        <li><i>Parser(s) used, if any</i>: </li>
+                        <li><i>Analyzer(s) used</i>: A home grown analyzer that simply removes stopwords.</li>
+                        <li><i>Number of fields per document</i>: 1 - 31</li>
+                        <li><i>Type of fields</i>: All text, though 2 of them are dates (20001205) that we filter on</li>
+                        <li><i>Index persistence</i>: FSDirectory</li>
+                        <li><i>Index size</i>: 12.5 GB</li>
+                    </p>
+                    <p>
+                        <b>Figures</b><br/>
+                        <li><i>Time taken (in ms/s as an average of at least 3
+                                indexing runs)</i>: For 617271 documents, 209698 seconds (or ~2.5 days)</li>
+                        <li><i>Time taken / 1000 docs indexed</i>: 340 Seconds</li>
+                        <li><i>Memory consumption</i>: (java executed with) java -Xmx1000m -Xss8192k so
+                            1 GB of memory was allotted to the indexer</li>
+                    </p>
+                    <p>
+                        <b>Notes</b><br/>
+                            <p>
+                                The source documents were XML.  The "indexer" opened each document one at a time, ran an
+                                XSL transformation on them, and then proceeded to index the stream.  The indexer optimized
+                                the index every 50,000 documents (on this run) though previously, we optimized every
+                                300,000 documents.  The performance didn't change much either way.  We did no other
+                                tuning (RAM Directories, separate process to pretransform the source material, etc.)
+                                to make it index faster.  When all of these individual indexes were built, they were
+                                merged together into the main index.  That process usually took ~ a day.
+                            </p>
+                    </p>
+                </ul>
+                <p>
+                    Daniel can be contacted at Armbrust.Daniel at mayo.edu.
+                </p>
+            </section>
+            <section id="Geoffrey Peddle's benchmarks"><title>Geoffrey Peddle's benchmarks</title>
+                <p>
+                  I'm doing a technical evaluation of search engines 
+                  for Ariba, an enterprise application software company.
+                   I compared Lucene to a commercial C language based
+                  search engine which I'll refer to as vendor A.  
+                  Overall Lucene's performance was similar to vendor A
+                  and met our application's requirements.  I've
+                  summarized our results below.
+                </p>
+                <p>
+                  Search scalability:<br/>
+                  We ran a set of 16 queries in a single thread for 20
+                  iterations.  We report below the times for the last 15
+                  iterations (ie after the system was warmed up).   The
+                  4 sets of results below are for indexes with between
+                  50,000 documents to 600,000 documents.  Although the
+                  times for Lucene grew faster with document count than
+                  vendor A they were comparable.
+                </p>
+<pre>
+50K  documents
+Lucene   5.2   seconds
+A        7.2
+200K
+Lucene   15.3
+A        15.2
+400K
+Lucene    28.2
+A         25.5
+600K
+Lucene    41
+A         33
+</pre>
+                <p>
+                  Individual Query times:<br/>
+                  Total query times are very similar between the 2
+                  systems but there were larger differences when you
+                  looked at individual queries.
+                </p>
+                <p>
+                  For simple queries with small result sets Vendor A was
+                  consistently faster than Lucene.   For example a
+                  single query might take vendor A 32 thousands of a
+                  second and Lucene 64 thousands of a second.    Both
+                  times are however well within acceptable response
+                  times for our application.
+                </p>
+                <p>
+                  For simple queries with large result sets Vendor A was
+                  consistently slower than Lucene.   For example a
+                  single query might take vendor A 300 thousands of a
+                  second and Lucene 200 thousands of a second.
+                  For more complex queries of the form   (term1 or term2
+                  or term3)  AND (term4 or term5 or term6) AND (term7 or
+                  term8)    the results were more divergent.  For
+                  queries with small result sets Vendor A generally had
+                  very short response times and sometimes Lucene had
+                  significantly larger response times.  For example
+                  Vendor A might take 16 thousands of a second and
+                  Lucene might take 156.   I do not consider it to be
+                  the case that Lucene's response time grew unexpectedly
+                  but rather that Vendor A appeared to be taking
+                  advantage of an optimization which Lucene didn't have.
+                    (I believe there's been discussions on the dev
+                  mailing list on complex queries of this sort.)
+                </p>
+                <p>
+                  Index Size:<br/>
+                  For our test data the size of both indexes grew
+                  linearly with the number of documents.   Note that
+                  these sizes are compact sizes, not maximum size during
+                  index loading.   The numbers below are from running du
+                  -k in the directory containing the index data.   The
+                  larger number's below for Vendor A may be because it
+                  supports additional functionality not available in
+                  Lucene.   I think it's the constant rate of growth
+                  rather than the absolute amount which is more
+                  important.
+                </p>
+<pre>
+50K  documents
+Lucene      45516 K
+A           63921
+200K
+Lucene      171565
+A           228370
+400K
+Lucene      345717
+A           457843
+600K
+Lucene      511338
+A           684913
+</pre>
+                <p>
+                  Indexing Times:<br/>
+                  These times are for reading the documents from our
+                  database, processing them, inserting them into the
+                  document search product and index compacting.   Our
+                  data has a large number of fields/attributes.   For
+                  this test I restricted Lucene to 24 attributes to
+                  reduce the number of files created.  Doing this I was
+                  able to specify a merge width for Lucene of 60.   I
+                  found in general that Lucene indexing performance to
+                  be very sensitive to changes in the merge width.  
+                  Note also that our application does a full compaction
+                  after inserting every 20,000 documents.   These times
+                  are just within our acceptable limits but we are
+                  interested in alternatives to increase Lucene's
+                  performance in this area.
+                </p>
+<p>
+<pre>
+600K documents
+Lucene       81 minutes
+A            34 minutes
+</pre>
+</p>
+                <p>
+                  (I don't have accurate results for all sizes on this
+                  measure but believe that the indexing time for both
+                  solutions grew essentially linearly with size.   The
+                  time to compact the index generally grew with index
+                  size but it's a small percent of overall time at these
+                  sizes.)
+                </p>
+                <ul>
+                    <p>
+                        <b>Hardware Environment</b><br/>
+                        <li><i>Dedicated machine for indexing</i>: yes</li>
+                        <li><i>CPU</i>: Dell Pentium 4 CPU 2.00Ghz, 1cpu</li>
+                        <li><i>RAM</i>: 1 GB Memory</li>
+                        <li><i>Drive configuration</i>: Fujitsu MAM3367MP SCSI </li>
+                    </p>
+                    <p>
+                        <b>Software environment</b><br/>
+                        <li><i>Java Version</i>: 1.4.2_02</li>
+                        <li><i>Java VM</i>: JDK</li>
+                        <li><i>OS Version</i>: Windows XP </li>
+                        <li><i>Location of index</i>: local</li>
+                    </p>
+                    <p>
+                        <b>Lucene indexing variables</b><br/>
+                        <li><i>Number of source documents</i>: 600,000</li>
+                        <li><i>Total filesize of source documents</i>: from database</li>
+                        <li><i>Average filesize of source documents</i>: from database</li>
+                        <li><i>Source documents storage location</i>: from database</li>
+                        <li><i>File type of source documents</i>: XML</li>
+                        <li><i>Parser(s) used, if any</i>: </li>
+                        <li><i>Analyzer(s) used</i>: small variation on WhitespaceAnalyzer</li>
+                        <li><i>Number of fields per document</i>: 24</li>
+                        <li><i>Type of fields</i>: A1 keyword, 1 big unindexed, rest are unstored and a mix of tokenized/untokenized</li>
+                        <li><i>Index persistence</i>: FSDirectory</li>
+                        <li><i>Index size</i>: 12.5 GB</li>
+                    </p>
+                    <p>
+                        <b>Figures</b><br/>
+                        <li><i>Time taken (in ms/s as an average of at least 3
+                                indexing runs)</i>: 600,000 documents in 81 minutes   (du -k = 511338)</li>
+                        <li><i>Time taken / 1000 docs indexed</i>: 123 documents/second</li>
+                        <li><i>Memory consumption</i>: -ms256m -mx512m -Xss4m -XX:MaxPermSize=512M</li>
+                    </p>
+                    <p>
+                        <b>Notes</b><br/>
+                          <p>
+                            <li>merge width of 60</li>
+                            <li>did a compact every 20,000 documents</li>
+                          </p>
+                    </p>
+                </ul>
+            </section>
+        </section>
+
+    </body>
+</document>

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,327 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>
+	Apache Lucene - Contributions
+		</title>
+	</header>
+    <properties>
+        <author email="carlson@apache.org">
+            Peter Carlson
+        </author>
+    </properties>
+    <body>
+        <section id="Overview">
+            <title>Overview</title>
+            <p>This page lists external Lucene resources. If you have
+            written something that should be included, please post all
+            relevant information to one of the mailing lists.  Nothing
+            listed here is directly supported by the Lucene
+            developers, so if you encounter any problems with any of
+            this software, please use the author's contact information
+            to get help.</p>
+            <p>If you are looking for information on contributing patches or other improvements to Lucene, see
+            <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">How To Contribute</a> on the Lucene Wiki.</p>
+        </section>
+
+        <section id="Lucene Tools">
+            <title>Lucene Tools</title>
+            <p>
+                Software that works with Lucene indices.
+            </p>
+            <section id="Luke"><title>Luke</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://www.getopt.org/luke/">
+                                http://www.getopt.org/luke/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Andrzej Bialecki
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="LIMO (Lucene Index Monitor)">
+                <title>LIMO (Lucene Index Monitor)</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://limo.sf.net/">
+                                http://limo.sf.net/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Julien Nioche
+                        </td>
+                    </tr>
+                </table>
+            </section>
+        </section>
+
+        <section id="Lucene Document Converters">
+            <title>Lucene Document Converters</title>
+            <p>
+                Lucene requires information you want to index to be
+                converted into a Document class.  Here are
+                contributions for various solutions that convert different
+                content types to Lucene's Document classes.
+            </p>
+            <section id="XML Document #1">
+                <title>XML Document #1</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://marc.theaimsgroup.com/?l=lucene-dev&amp;m=100723333506246&amp;w=2">
+                                http://marc.theaimsgroup.com/?l=lucene-dev&amp;m=100723333506246&amp;w=2
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Philip Ogren - ogren@mayo.edu
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="XML Document #2">
+                <title>XML Document #2</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html">
+                                http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Peter Carlson - carlson@bookandhammer.com
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="PDF Box">
+                <title>PDF Box</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://www.pdfbox.org/">
+                                http://www.pdfbox.org/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Ben Litchfield - ben@csh.rit.edu
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="XPDF - PDF Document Conversion">
+                <title>XPDF - PDF Document Conversion</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://www.foolabs.com/xpdf">
+                                http://www.foolabs.com/xpdf
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            N/A
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="PDFTextStream -- PDF text and metadata extraction">
+                <title>PDFTextStream -- PDF text and metadata extraction</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://snowtide.com">
+                                http://snowtide.com
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            N/A
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="PJ Classic &amp; PJ Professional - PDF Document Conversion">
+                <title>PJ Classic &amp; PJ Professional - PDF Document Conversion</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href=" http://www.etymon.com/">
+                                http://www.etymon.com/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            N/A
+                        </td>
+                    </tr>
+                </table>
+            </section>
+        </section>
+
+        <section id="Miscellaneous">
+            <title>Miscellaneous</title>
+            <p>
+            </p>
+            <section id="Arabic Analyzer for Java">
+                <title>Arabic Analyzer for Java</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://savannah.nongnu.org/projects/aramorph">
+                                http://savannah.nongnu.org/projects/aramorph
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Pierrick Brihaye
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="Phonetix">
+                <title>Phonetix</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html">
+                                http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            tangentum technologies
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="ejIndex - JBoss MBean for Lucene">
+                <title>ejIndex - JBoss MBean for Lucene</title>
+                <p>
+                </p>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="http://ejindex.sourceforge.net/">
+                                http://ejindex.sourceforge.net/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Andy Scholz
+                        </td>
+                    </tr>
+                </table>
+            </section>
+            <section id="JavaCC">
+                <title>JavaCC</title>
+                <table>
+                    <tr>
+                        <th width="%1">
+                            URL
+                        </th>
+                        <td>
+                            <a href="https://javacc.dev.java.net/">
+                                https://javacc.dev.java.net/
+                            </a>
+                        </td>
+                    </tr>
+                    <tr>
+                        <th width="%1">
+                            author
+                        </th>
+                        <td>
+                            Sun Microsystems (java.net)
+                        </td>
+                    </tr>
+                </table>
+            </section>
+        </section>
+    </body>
+</document>

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,78 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>
+	Apache Lucene - Building and Installing the Basic Demo
+		</title>
+	</header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About this Document"><title>About this Document</title>
+<p>
+This document is intended as a "getting started" guide to using and running the Lucene demos.
+It walks you through some basic installation and configuration.
+</p>
+</section>
+
+
+<section id="About the Demos"><title>About the Demos</title>
+<p>
+The Lucene command-line demo code consists of two applications that demonstrate various
+functionalities of Lucene and how one should go about adding Lucene to their applications.
+</p>
+</section>
+
+<section id="Setting your CLASSPATH"><title>Setting your CLASSPATH</title>
+<p>
+First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
+latest Lucene distribution and then extract it to a working directory.  Alternatively, you can <a
+href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
+Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
+</p>
+<p>
+You should see the Lucene JAR file in the directory you created when you extracted the archive.  It
+should be named something like <code>lucene-core-{version}.jar</code>.  You should also see a file
+called <code>lucene-demos-{version}.jar</code>.  If you checked out the sources from Subversion then
+the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
+successfully).  Put both of these files in your Java CLASSPATH.
+</p>
+</section>
+
+<section id="Indexing Files"><title>Indexing Files</title>
+<p>
+Once you've gotten this far you're probably itching to go.  Let's <b>build an index!</b> Assuming
+you've set your CLASSPATH correctly, just type:
+
+<pre>
+    java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
+</pre>
+
+This will produce a subdirectory called <code>index</code> which will contain an index of all of the
+Lucene source code.
+</p>
+<p>
+To <b>search the index</b> type:
+
+<pre>
+    java org.apache.lucene.demo.SearchFiles
+</pre>
+
+You'll be prompted for a query.  Type in a swear word and press the enter key.  You'll see that the
+Lucene developers are very well mannered and get no results. Now try entering the word "vector".
+That should return a whole bunch of documents.  The results will page at every tenth result and ask
+you whether you want more results.
+</p>
+</section>
+
+<section id="About the code..."><title>About the code...</title>
+<p>
+<a href="demo2.html">read on&gt;&gt;&gt;</a>
+</p>
+</section>
+
+</body>
+</document>
+

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,139 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>
+	Apache Lucene - Basic Demo Sources Walk-through
+		</title>
+	</header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About the Code"><title>About the Code</title>
+<p>
+In this section we walk through the sources behind the command-line Lucene demo: where to find them,
+their parts and their function.  This section is intended for Java developers wishing to understand
+how to use Lucene in their applications.
+</p>
+</section>
+
+
+<section id="Location of the source"><title>Location of the source</title>
+
+<p>
+Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
+should see a directory called <code>src</code> which in turn contains a directory called
+<code>demo</code>.  This is the root for all of the Lucene demos.  Under this directory is
+<code>org/apache/lucene/demo</code>.  This is where all the Java sources for the demos live.
+</p>
+
+<p>
+Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
+Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
+</p>
+
+</section>
+
+<section id="IndexFiles"><title>IndexFiles</title>
+
+<p>
+As we discussed in the previous walk-through, the <code><a
+href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
+Index. Let's take a look at how it does this.
+</p>
+
+<p>
+The first substantial thing the <code>main</code> function does is instantiate <code><a
+href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>.  It passes the string
+"<code>index</code>" and a new instance of a class called <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
+The "<code>index</code>" string is the name of the filesystem directory where all index information
+should be stored.  Because we're not passing a full path, this will be created as a subdirectory of
+the current working directory (if it does not already exist). On some platforms, it may be created
+in other directories (such as the user's home directory).
+</p>
+
+<p>
+The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
+class responsible for creating indices.  To use it you must instantiate it with a path that it can
+write the index into.  If this path does not exist it will first create it.  Otherwise it will
+refresh the index at that path.  You can also create an index using one of the subclasses of <code><a
+href="api/org/apache/lucene/store/Directory.html">Directory</a></code>.  In any case, you must also pass an
+instance of <code><a
+href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
+</p>
+
+<p>
+The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
+are using, <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
+little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
+useless words and characters from the index.  By useless words and characters I mean common language
+words such as articles (a, an, the, etc.) and other strings that would be useless for searching
+(e.g. <b>'s</b>) .  It should be noted that there are different rules for every language, and you
+should use the proper analyzer for each.  Lucene currently provides Analyzers for a number of
+different languages (see the <code>*Analyzer.java</code> sources under <a
+href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
+</p>
+
+<p>
+Looking further down in the file, you should see the <code>indexDocs()</code> code.  This recursive
+function simply crawls the directories and uses <code><a
+href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code> objects.  The <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
+represent the content in the file as well as its creation time and location.  These instances are
+added to the <code>indexWriter</code>.  Take a look inside <code><a
+href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>.  It's not particularly
+complicated.  It just adds fields to the <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code>.
+</p>
+
+<p>
+As you can see there isn't much to creating an index.  The devil is in the details.  You may also
+wish to examine the other samples in this directory, particularly the <code><a
+href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class.  It is a bit more
+complex but builds upon this example.
+</p>
+
+</section>
+
+<section id="Searching Files"><title>Searching Files</title>
+
+<p>
+The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
+quite simple.  It primarily collaborates with an <code><a
+href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
+(which is used in the <code><a
+href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
+<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>.  The
+query parser is constructed with an analyzer used to interpret your query text in the same way the
+documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
+'the'.  The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
+the results from the <code><a
+href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
+the searcher.  Note that it's also possible to programmatically construct a rich <code><a
+href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
+parser.  The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
+syntax</a> into the corresponding <code><a
+href="api/org/apache/lucene/search/Query.html">Query</a></code> object.  The searcher results are
+returned in a collection of Documents called <code><a
+href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
+displayed to the user.
+</p>
+
+</section>
+
+<section id="The Web example..."><title>The Web example...</title>
+
+<p>
+<a href="demo3.html">read on&gt;&gt;&gt;</a>
+</p>
+
+</section>
+
+</body>
+</document>
+

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,90 @@
+<?xml version="1.0"?>
+
+<document>
+	<header>
+        <title>
+	Apache Lucene - Building and Installing the Basic Demo
+		</title>
+	</header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About this Document"><title>About this Document</title>
+<p>
+This document is intended as a "getting started" guide to installing and running the Lucene
+web application demo.  This guide assumes that you have read the information in the previous two
+examples.  We'll use Tomcat as our reference web container.  These demos should work with nearly any
+container, but you may have to adapt them appropriately.
+</p>
+</section>
+
+
+<section id="About the Demos"><title>About the Demos</title>
+<p>
+The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
+similar web container.  It's NOT designed as a "best practices" implementation by ANY means.  It's
+more of a "hello world" type Lucene Web App.  The purpose of this application is to demonstrate
+Lucene.  With that being said, it should be relatively simple to create a small searchable website
+in Tomcat or a similar application server.
+</p>
+</section>
+
+<section id="Indexing Files"><title>Indexing Files</title>
+<p> Once you've gotten this far you're probably itching to go.  Let's start by creating the index
+you'll need for the web examples.  Since you've already set your CLASSPATH in the previous examples,
+all you need to do is type:
+
+<pre>
+    java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
+</pre>
+
+You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
+(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
+<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
+outside of a web accessible context.  By default the webapp is configured to look in
+<code>/opt/lucene/index</code> for this index.
+</p>
+</section>
+
+<section id="Deploying the Demos"><title>Deploying the Demos</title>
+<p>Located in your distribution directory you should see a war file called
+<code>luceneweb.war</code>.  If you're working with a Subversion checkout, this will be under the
+<code>build</code> subdirectory.  Copy this to your <code>{tomcat-home}/webapps</code> directory.
+You may need to restart Tomcat.  </p> </section>
+
+<section id="Configuration"><title>Configuration</title>
+<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory.  If it's not
+present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
+the webapp), then look again.  Edit a file called <code>configuration.jsp</code>.  Ensure that the
+<code>indexLocation</code> is equal to the location you used for your index.  You may also customize
+the <code>appTitle</code> and <code>appFooter</code> strings as you see fit.  Once you have finished
+altering the configuration you may need to restart Tomcat.  You may also wish to update the war file
+by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
+subdirectory.  (The -u option is not available in all versions of jar.  In this case recreate the
+war file).
+</p>
+</section>
+
+<section id="Running the Demos"><title>Running the Demos</title>
+<p>Now you're ready to roll.  In your browser set the url to
+<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
+page and press search.</p>
+<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
+examples) or nothing.  If you get an error regarding opening the index, then you probably set the
+path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
+(or you skipped the step of creating it).  Try other search terms.  Depending on the number of items
+per page you set and results returned, there may be a link at the bottom that says <b>More
+Results>></b>; clicking it takes you to subsequent pages.  </p> </section>
+
+<section id="About the code..."><title>About the code...</title>
+<p>
+If you want to know more about how this web app works or how to customize it then <a
+href="demo4.html">read on&gt;&gt;&gt;</a>.
+</p>
+</section>
+
+</body>
+</document>
+



Mime
View raw message