manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kwri...@apache.org
Subject svn commit: r1553120 [2/2] - /manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
Date Mon, 23 Dec 2013 14:40:36 GMT

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1553120&r1=1553119&r2=1553120&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml (original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Mon Dec 23 14:40:36 2013
@@ -565,137 +565,7 @@
         
         <section id="outputconnectiontypes">
             <title>Output Connection Types</title>
-            
-            <section id="solroutputconnector">
-                <title>Solr Output Connection</title>
-                <p>The Solr output connection type is designed to allow ManifoldCF to submit documents to either an appropriate Apache Solr instance,
-                       via the Solr HTTP API, or alternatively to a Solr Cloud cluster.  The configuration parameters are initially set to appropriate default
-                       values for a stand-alone Solr instance.</p>
-                <p>When you create a Solr output connection, multiple configuration tabs appear.  The first tab is the "Solr type" tab.  Here you select
-                       whether you want your connection to communicate to a standalone Solr instance, or to a Solr Cloud cluster:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-configure-solr-type.PNG" alt="Solr Configuration, Solr type tab" width="80%"/>
-                <br/><br/>
-                <p>Select which kind of Solr installation you want to communicate with.  Based on your selection, you can proceed to either the "Server"
-                       tab (if a standalone instance) or to the "ZooKeeper" tab (if a SolrCloud cluster).</p>
-                <p>The "Server" tab allows you to configure the HTTP parameters appropriate for communicating with a standalone Solr instance:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-configure-server.PNG" alt="Solr Configuration, Server tab" width="80%"/>
-                <br/><br/>
-                <p>If your Solr setup is a standalone instance, fill in the fields according to your Solr configuration.  The Solr connection type supports
-                       only basic authentication at this time; if you have this enabled, supply the credentials as requested on the bottom part of the form.</p>
-                <p>The "Zookeeper" tab allows your to configure the connection type to communicate with a Solr Cloud cluster:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-configure-zookeeper.PNG" alt="Solr Configuration, Zookeeper tab" width="80%"/>
-                <br/><br/>
-                <p>Here, add each ZooKeeper instance in the SolrCloud cluster to the list of ZooKeeper instances.  The connection comes preconfigured with
-                       "localhost" as being a ZooKeeper instance.  You may delete this if it is not the case.</p>
-                <p>The next tab is the "Schema" tab, which allows you to specify the names of various Solr fields into which the Solr connection type will
-                       place built-in document metadata:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-configure-schema.PNG" alt="Solr Configuration, Schema tab" width="80%"/>
-                <br/><br/>
-                <p>The most important of these is the document identifier field, which MUST be present for the connection type to function.  This field will
-                       be used to uniquely identify the document within Solr, and will contain the document's URL.  The Solr connection type will treat this field as being
-                       a unique key for locating the indexed document for further modification or deletion.  The other Solr fields are optional, and largely self-
-                       explanatory.</p>
-                <p>The next tab is the "Arguments" tab, which allows you to specify arbitrary arguments to be sent to Solr:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-configure-arguments.PNG" alt="Solr Configuration, Arguments tab" width="80%"/>
-                <br/><br/>
-                <p>Fill in the argument name and value, and click the "Add" button.  Bear in mind that if you add an argument with the same name as an existing one, it will replace the
-                       existing one with the new specified value.  You can delete existing arguments by clicking the "Delete" button next to the argument you want to delete.</p>
-                <p>Use this tab to specify any and all desired Solr update request parameters.  You can, for instance, add
-                       <a href="http://wiki.apache.org/solr/UpdateRequestProcessor">update.chain=myChain</a> to select a specific document processing pipeline/chain to
-                       use for processing documents. See the Solr documentation for more valid arguments.</p>
-                <p>The next tab is the "Documents" tab, which allows you to do document filtering based on size and mime types. By specifying a maximum document
-                       length in bytes, you can filter out documents which exceed that size (e.g. 10485760 which is equivalent to 10 MB). If you only want to add
-                       documents with specific mime types, you can enter them into the "included mime types" field (e.g. "text/html" for filtering out all documents but HTML).
-                       The "excluded mime types" field is for excluding documents with specific mime types (e.g. "image/jpeg" for filtering out JPEG images). The tab looks like:</p>
-                <figure src="images/en_US/solr-configure-documents.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
-                <br/><br/>
-                <p>The fifth tab is the "Commits" tab, which allows you to control the commit strategies. As well as committing documents at the end of every job, an
-                       option which is enabled by default, you may also commit each document within a certain time in milliseconds (e.g. "10000" for committing within
-                       10 seconds). The <a href="http://wiki.apache.org/solr/CommitWithin">commit within</a> strategy will leave the responsibility to Solr instead
-                       of ManifoldCF. The tab looks like:</p>
-                <figure src="images/en_US/solr-configure-commits.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
-                <br/><br/>
-                <p>When you are done, don't forget to click the "Save" button to save your changes!  When you do, a connection summary and status screen will be
-                       presented, which may look something like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-status.PNG" alt="Solr Status" width="80%"/>
-                <br/><br/>
-                <p>Note that in this example, the Solr connection is not responding, which is leading to an error status message instead of "Connection working".</p>
-                <p>When you configure a job to use a Solr-type output connection, the Solr connection type provides a tab called "Field Mapping".  The purpose of this tab
-                       is to allow you to map metadata fields as fetched by the job's connection type to fields that Solr is set up to receive.  This is necessary because
-                       the names of the metadata items are often determined by the repository, with no alignment to fields defined in the Solr schema.  You may also
-                       suppress specific metadata items from being sent to the index using this tab.  The tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/solr-job-field-mapping.PNG" alt="Solr Specification, Field Mapping tab" width="80%"/>
-                <br/><br/>
-                <p>Add a new mapping by filling in the "source" with the name of the metadata item from the repository, and "target" as the name of the output field in
-                       Solr, and click the "Add" button.  Leaving the "target" field blank will result in all metadata items of that name not being sent to Solr.</p>
-            </section>
-            
-            <section id="filesystemoutputconnector">
-                <title>File System Output Connection</title>
-                <p>The File System output connection type allows ManifoldCF to store documents in a local filesystem, using the conventions established by the
-                    Unix utility called <em>wget</em>.  Documents stored by this connection type will not include any metadata or security information, but instead
-                    consist solely of a binary file.</p>
-                <p>The connection configuration information for the File System output connection type includes no additional tabs.  There is an additional job tab,
-                    however, called "Output Path".  The tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/filesystem-job-output-path.PNG" alt="File System Specification, Output Path tab" width="80%"/>
-                <br/><br/>
-                <p>Fill in the path you want the connection type to use to write the documents to.  Then, click the "Save" button.</p>
-            </section>
-
-            <section id="hdfsoutputconnector">
-                <title>HDFS Output Connection</title>
-                <p>The HDFS output connection type allows ManifoldCF to store documents in HDFS, using the conventions established by the
-                    Unix utility called <em>wget</em>.  Documents stored by this connection type will not include any metadata or security information, but instead
-                    consist solely of a binary file.</p>
-                <p>The connection configuration information for the HDFS output connection type includes one additional tab: the "Server" tab.  This tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/hdfs-configure-server.PNG" alt="HDFS Output Configuration, Server tab" width="80%"/>
-                <br/><br/>
-                <p>Fill in the name node URI and the user name.  Both are required.</p>
-                <p>For the HDFS output connection type, there is an additional job tab called "Output Path".  The tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/hdfs-job-output-path.PNG" alt="HDFS Output Specification, Output Path tab" width="80%"/>
-                <br/><br/>
-                <p>Fill in the path you want the connection type to use to write the documents to.  Then, click the "Save" button.</p>
-            </section>
 
-            <section id="opensearchserveroutputconnector">
-                <title>OpenSearchServer Output Connection</title>
-                <p>The OpenSearchServer Output Connection allow ManifoldCF to submit documents to an OpenSearchServer instance, via the XML over HTTP API. The connector has been designed
-            	to be as easy to use as possible.</p>
-                <p>After creating an OpenSearchServer ouput connection, you have to populate the parameters tab. Fill in the fields according your OpenSearchServer configuration. Each
-            	OpenSearchServer output connector instance works with one index. To work with muliple indexes, just create one output connector for each index.</p>
-                <figure src="images/en_US/opensearchserver-connection-parameters.PNG" alt="OpenSearchServer, parameters tab" width="80%"/>
-                <p>The parameters are:</p><br/>
-                <ul>
-                      <li>Server location: An URL that references your OpenSearchServer instance. The default value (http://localhost:8080) is valid if your OpenSearchServer instance runs
-                          on the same server than the ManifoldCF instance.</li>
-                      <li>Index name: The connector will populate the index defined here.</li>
-                      <li>User name and API Key: The credentials required to connect to the OpenSearchServer instance. It can be left empty if no user has been created. The next figure shows
-                          where to find the user's informations in the OpenSearchServer user interface.</li>
-                </ul>
-                <figure src="images/en_US/opensearchserver-user.PNG" alt="OpenSearchServer, user configuration" width="80%"/>
-                <p>Once you created a new job, having selected the OpenSearchServer output connector, you will have the OpenSearchServer tab. This tab let you:</p><br/>
-                <ul>
-                      <li>Fix the maximum size of a document before deciding to index it. The value is in bytes. The default value is 16MB.</li>
-                      <li>The allowed mime types. Warning it does not work with all repository connectors.</li>
-                      <li>The allowed file extensions. Warning it does not work with all repository connectors.</li>
-                </ul>
-                <figure src="images/en_US/opensearchserver-job-parameters.PNG" alt="OpenSearchServer, job parameters" width="80%"/>
-                <p>In the history report you will be able to monitor all the activites. The connector supports three activites: Document ingestion (Indexation), document deletion and
-                    index optimization. The targeted index is automatically optimized when the job is ending.</p>
-                <figure src="images/en_US/opensearchserver-history-report.PNG" alt="OpenSearchServer, history report" width="80%"/>
-                <p>You may also refer to the <a href="http://www.open-search-server.com/documentation">OpenSearchServer's user documentation</a>.</p>
-            </section>
-            
             <section id="elasticsearchoutputconnector">
                 <title>ElasticSearch Output Connection</title>
                 <p>The ElasticSearch Output Connection allow ManifoldCF to submit documents to an ElasticSearch instance, via the XML over HTTP API. The connector has been designed
@@ -755,6 +625,37 @@
 curl -XGET http://localhost:9200/index/_mapping
                 </source>
             </section>
+
+            <section id="filesystemoutputconnector">
+                <title>File System Output Connection</title>
+                <p>The File System output connection type allows ManifoldCF to store documents in a local filesystem, using the conventions established by the
+                    Unix utility called <em>wget</em>.  Documents stored by this connection type will not include any metadata or security information, but instead
+                    consist solely of a binary file.</p>
+                <p>The connection configuration information for the File System output connection type includes no additional tabs.  There is an additional job tab,
+                    however, called "Output Path".  The tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/filesystem-job-output-path.PNG" alt="File System Specification, Output Path tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the path you want the connection type to use to write the documents to.  Then, click the "Save" button.</p>
+            </section>
+
+            <section id="hdfsoutputconnector">
+                <title>HDFS Output Connection</title>
+                <p>The HDFS output connection type allows ManifoldCF to store documents in HDFS, using the conventions established by the
+                    Unix utility called <em>wget</em>.  Documents stored by this connection type will not include any metadata or security information, but instead
+                    consist solely of a binary file.</p>
+                <p>The connection configuration information for the HDFS output connection type includes one additional tab: the "Server" tab.  This tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/hdfs-configure-server.PNG" alt="HDFS Output Configuration, Server tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the name node URI and the user name.  Both are required.</p>
+                <p>For the HDFS output connection type, there is an additional job tab called "Output Path".  The tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/hdfs-job-output-path.PNG" alt="HDFS Output Specification, Output Path tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the path you want the connection type to use to write the documents to.  Then, click the "Save" button.</p>
+            </section>
+
             
             <section id="gtsoutputconnector">
                 <title>MetaCarta GTS Output Connection</title>
@@ -773,221 +674,198 @@ curl -XGET http://localhost:9200/index/_
                 <p>The null output connection type simply logs indexing and deletion requests, and does nothing else.  It does not have any special configuration tabs, nor does it
                        contribute tabs to jobs defined that use it.</p>
             </section>
-            
-        </section>
 
-        <section id="mappingconnectiontypes">
-            <title>User Mapping Connection Types</title>
+            <section id="opensearchserveroutputconnector">
+                <title>OpenSearchServer Output Connection</title>
+                <p>The OpenSearchServer Output Connection allow ManifoldCF to submit documents to an OpenSearchServer instance, via the XML over HTTP API. The connector has been designed
+            	to be as easy to use as possible.</p>
+                <p>After creating an OpenSearchServer ouput connection, you have to populate the parameters tab. Fill in the fields according your OpenSearchServer configuration. Each
+            	OpenSearchServer output connector instance works with one index. To work with muliple indexes, just create one output connector for each index.</p>
+                <figure src="images/en_US/opensearchserver-connection-parameters.PNG" alt="OpenSearchServer, parameters tab" width="80%"/>
+                <p>The parameters are:</p><br/>
+                <ul>
+                      <li>Server location: An URL that references your OpenSearchServer instance. The default value (http://localhost:8080) is valid if your OpenSearchServer instance runs
+                          on the same server than the ManifoldCF instance.</li>
+                      <li>Index name: The connector will populate the index defined here.</li>
+                      <li>User name and API Key: The credentials required to connect to the OpenSearchServer instance. It can be left empty if no user has been created. The next figure shows
+                          where to find the user's informations in the OpenSearchServer user interface.</li>
+                </ul>
+                <figure src="images/en_US/opensearchserver-user.PNG" alt="OpenSearchServer, user configuration" width="80%"/>
+                <p>Once you created a new job, having selected the OpenSearchServer output connector, you will have the OpenSearchServer tab. This tab let you:</p><br/>
+                <ul>
+                      <li>Fix the maximum size of a document before deciding to index it. The value is in bytes. The default value is 16MB.</li>
+                      <li>The allowed mime types. Warning it does not work with all repository connectors.</li>
+                      <li>The allowed file extensions. Warning it does not work with all repository connectors.</li>
+                </ul>
+                <figure src="images/en_US/opensearchserver-job-parameters.PNG" alt="OpenSearchServer, job parameters" width="80%"/>
+                <p>In the history report you will be able to monitor all the activites. The connector supports three activites: Document ingestion (Indexation), document deletion and
+                    index optimization. The targeted index is automatically optimized when the job is ending.</p>
+                <figure src="images/en_US/opensearchserver-history-report.PNG" alt="OpenSearchServer, history report" width="80%"/>
+                <p>You may also refer to the <a href="http://www.open-search-server.com/documentation">OpenSearchServer's user documentation</a>.</p>
+            </section>
             
-            <section id="regexpmapper">
-                <title>Regular Expression User Mapping Connection</title>
-                <p>The Regular Expression user mapping connection type is very helpful for rote user name conversions of all sorts.  For example, it can easily be configured to map the standard "user@domain" form
-                       of an Active Directory user name to (say) a LiveLink equivalent, e.g. "domain\user".  Since many repositories establish such rote conversions, the Regular Expression user mapping connection
-                       type is often all that you will ever need.</p>
-                <br/>
-                <p>A Regular Expression user mapping connection type has one special tab in the user mapping connection editing screen: "User Mapping".  This
-                       tab looks like this:</p>
+            <section id="solroutputconnector">
+                <title>Solr Output Connection</title>
+                <p>The Solr output connection type is designed to allow ManifoldCF to submit documents to either an appropriate Apache Solr instance,
+                       via the Solr HTTP API, or alternatively to a Solr Cloud cluster.  The configuration parameters are initially set to appropriate default
+                       values for a stand-alone Solr instance.</p>
+                <p>When you create a Solr output connection, multiple configuration tabs appear.  The first tab is the "Solr type" tab.  Here you select
+                       whether you want your connection to communicate to a standalone Solr instance, or to a Solr Cloud cluster:</p>
                 <br/><br/>
-                <figure src="images/en_US/regexp-mapping-user-mapping.PNG" alt="Regexp User Mapping, User Mapping tab" width="80%"/>
+                <figure src="images/en_US/solr-configure-solr-type.PNG" alt="Solr Configuration, Solr type tab" width="80%"/>
                 <br/><br/>
-                <p>The mapping consists of a match expression, which is a regular expression where parentheses ("(" and ")") mark sections you are interested in, and a
-                       replace string.  The sections marked with parentheses are called "groups" in regular expression parlance.  The replace string consists of constant text plus
-                       substitutions of the groups from the match, perhaps modified.  For example, "$(1)" refers to the first group within the match, while "$(1l)" refers to the first
-                       match group mapped to lower case.  Similarly, "$(1u)" refers to the same characters, but mapped to upper case.</p>
-                <p>For example, a match expression of <code>^(.*)\@([A-Z|a-z|0-9|_|-]*)\.(.*)$</code> with a replace string of <code>$(2)\$(1l)</code> would convert
-                      an Active Directory username of <code>MyUserName@subdomain.domain.com</code> into the user name
-                      <code>subdomain\myusername</code>.</p>
-                <p>When you are done, click the "Save" button.  When you do, a connection summary and status screen will be presented, which may look something like this:</p>
+                <p>Select which kind of Solr installation you want to communicate with.  Based on your selection, you can proceed to either the "Server"
+                       tab (if a standalone instance) or to the "ZooKeeper" tab (if a SolrCloud cluster).</p>
+                <p>The "Server" tab allows you to configure the HTTP parameters appropriate for communicating with a standalone Solr instance:</p>
                 <br/><br/>
-                <figure src="images/en_US/regexp-mapping-status.PNG" alt="Regexp User Mapping Status" width="80%"/>
+                <figure src="images/en_US/solr-configure-server.PNG" alt="Solr Configuration, Server tab" width="80%"/>
                 <br/><br/>
-            </section>
-        </section>
-
-        <section id="authorityconnectiontypes">
-            <title>Authority Connection Types</title>
-            
-            <section id="adauthority">
-                <title>Active Directory Authority Connection</title>
-                <p>An active directory authority connection is essential for enforcing security for documents from Windows shares, Microsoft SharePoint (in ActiveDirectory mode), and IBM FileNet repositories.
-                       This connection type needs to be provided with information about how to log into an appropriate Windows domain controller, with a user that has sufficient privileges to
-                       be able to look up any user's ID and group relationships.</p>
-                <br/>
-                <p>An Active Directory authority connection type has two special tabs in the authority connection editing screen: "Domain Controller", and "Cache".  The "Domain Controller"
-                       tab looks like this:</p>
+                <p>If your Solr setup is a standalone instance, fill in the fields according to your Solr configuration.  The Solr connection type supports
+                       only basic authentication at this time; if you have this enabled, supply the credentials as requested on the bottom part of the form.</p>
+                <p>The "Zookeeper" tab allows your to configure the connection type to communicate with a Solr Cloud cluster:</p>
                 <br/><br/>
-                <figure src="images/en_US/ad-configure-dc.PNG" alt="AD Configuration, Domain Controller tab" width="80%"/>
+                <figure src="images/en_US/solr-configure-zookeeper.PNG" alt="Solr Configuration, Zookeeper tab" width="80%"/>
                 <br/><br/>
-                <p>As you can see, the Active Directory authority allows you to configure multiple connections to different, but presumably related, domain controllers.  The choice of
-                       which domain controller will be accessed is determined by traversing the list of configured domain controllers from top to bottom, and finding the first one that
-                       matches the domain suffix field specified.  Note that a blank value for the domain suffix will match <strong>all</strong> users.</p>
-                <p>To add a domain controller to the end of the list, fill in the requested values.  Note that the "Administrative user name" field usually requires no domain suffix, but
-                       depending on the details of how the domain controller is configured, may sometimes only accept the "name@domain" format.  When you have completed your
-                       entry, click the "Add to end" button to add the domain controller rule to the end of the list.  Later, when other domain controllers are present in the list, you can
-                       click a different button at an appropriate spot to insert the domain controller record into the list where you want it to go.</p>
-                <p>The Active Directory authority connection type also has a "Cache" tab, for managing the caching of individual user responses:</p>
+                <p>Here, add each ZooKeeper instance in the SolrCloud cluster to the list of ZooKeeper instances.  The connection comes preconfigured with
+                       "localhost" as being a ZooKeeper instance.  You may delete this if it is not the case.</p>
+                <p>The next tab is the "Schema" tab, which allows you to specify the names of various Solr fields into which the Solr connection type will
+                       place built-in document metadata:</p>
                 <br/><br/>
-                <figure src="images/en_US/ad-configure-cache.PNG" alt="AD Configuration, Cache tab" width="80%"/>
+                <figure src="images/en_US/solr-configure-schema.PNG" alt="Solr Configuration, Schema tab" width="80%"/>
                 <br/><br/>
-                <p>Here you can control how many individual users will be cached, and for how long.</p>
-                <p>When you are done, click the "Save" button.  When you do, a connection summary and status screen will be presented, which may look something like this:</p>
+                <p>The most important of these is the document identifier field, which MUST be present for the connection type to function.  This field will
+                       be used to uniquely identify the document within Solr, and will contain the document's URL.  The Solr connection type will treat this field as being
+                       a unique key for locating the indexed document for further modification or deletion.  The other Solr fields are optional, and largely self-
+                       explanatory.</p>
+                <p>The next tab is the "Arguments" tab, which allows you to specify arbitrary arguments to be sent to Solr:</p>
                 <br/><br/>
-                <figure src="images/en_US/ad-status.PNG" alt="AD Status" width="80%"/>
+                <figure src="images/en_US/solr-configure-arguments.PNG" alt="Solr Configuration, Arguments tab" width="80%"/>
                 <br/><br/>
-                <p>Note that in this example, the Active Directory connection is not responding, which is leading to an error status message instead of "Connection working".</p>
-            </section>
-
-            <section id="ldapauthority">
-                <title>LDAP Authority Connection</title>
-                <p>An LDAP authority connection can be used to provide document security in situations where there is no native document security
-                      model in place.  Examples include Samba shares, Wiki pages,  RSS feeds, etc.</p>
-                <p>The LDAP authority works by providing user or group names from an LDAP server as access tokens.  These access tokens can
-                      be used by any repository connection type that provides for access tokens entered on a per-job basis, or by the JCIFs connection type,
-                      which has explicit user/group name support built in, meant for Samba shares.</p>
-                <p>This connection type needs to be provided with information about how to log into an appropriate LDAP server, as well as search
-                      expressions needed to look up user and group records.  An active directory authority connection type has a single special tab in the
-                      authority connection editing screen: the "LDAP" tab:</p>
+                <p>Fill in the argument name and value, and click the "Add" button.  Bear in mind that if you add an argument with the same name as an existing one, it will replace the
+                       existing one with the new specified value.  You can delete existing arguments by clicking the "Delete" button next to the argument you want to delete.</p>
+                <p>Use this tab to specify any and all desired Solr update request parameters.  You can, for instance, add
+                       <a href="http://wiki.apache.org/solr/UpdateRequestProcessor">update.chain=myChain</a> to select a specific document processing pipeline/chain to
+                       use for processing documents. See the Solr documentation for more valid arguments.</p>
+                <p>The next tab is the "Documents" tab, which allows you to do document filtering based on size and mime types. By specifying a maximum document
+                       length in bytes, you can filter out documents which exceed that size (e.g. 10485760 which is equivalent to 10 MB). If you only want to add
+                       documents with specific mime types, you can enter them into the "included mime types" field (e.g. "text/html" for filtering out all documents but HTML).
+                       The "excluded mime types" field is for excluding documents with specific mime types (e.g. "image/jpeg" for filtering out JPEG images). The tab looks like:</p>
+                <figure src="images/en_US/solr-configure-documents.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
                 <br/><br/>
-                <figure src="images/en_US/ldap-configure-ldap.PNG" alt="LDAP Configuration, LDAP tab" width="80%"/>
+                <p>The fifth tab is the "Commits" tab, which allows you to control the commit strategies. As well as committing documents at the end of every job, an
+                       option which is enabled by default, you may also commit each document within a certain time in milliseconds (e.g. "10000" for committing within
+                       10 seconds). The <a href="http://wiki.apache.org/solr/CommitWithin">commit within</a> strategy will leave the responsibility to Solr instead
+                       of ManifoldCF. The tab looks like:</p>
+                <figure src="images/en_US/solr-configure-commits.PNG" alt="Solr Configuration, Documents tab" width="80%"/>
                 <br/><br/>
-                <p>Fill in the requested values.  Note that the "Server base" field contains the LDAP domain specification you want to search.  For
-                      example, if you have an LDAP domain for "people.myorg.com", the server based might be "dc=com,dc=myorg,dc=people".</p>
-                <p>When you are done, click the "Save" button.  When you do, a connection
-                       summary and status screen will be presented, which
-                       may look something like this:</p>
+                <p>When you are done, don't forget to click the "Save" button to save your changes!  When you do, a connection summary and status screen will be
+                       presented, which may look something like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/ldap-status.PNG" alt="LDAP Status" width="80%"/>
+                <figure src="images/en_US/solr-status.PNG" alt="Solr Status" width="80%"/>
                 <br/><br/>
-                <p>Note that in this example, the LDAP connection is not responding, which is leading to an error status message instead of "Connection working".</p>
+                <p>Note that in this example, the Solr connection is not responding, which is leading to an error status message instead of "Connection working".</p>
+                <p>When you configure a job to use a Solr-type output connection, the Solr connection type provides a tab called "Field Mapping".  The purpose of this tab
+                       is to allow you to map metadata fields as fetched by the job's connection type to fields that Solr is set up to receive.  This is necessary because
+                       the names of the metadata items are often determined by the repository, with no alignment to fields defined in the Solr schema.  You may also
+                       suppress specific metadata items from being sent to the index using this tab.  The tab looks like this:</p>
                 <br/><br/>
-				<p>Example configuration for ActiveDirectory server to fetch user groups:</p>
-				<ul>
-				  <li>Server: [xxx.yyy.zzz.ttt]</li>
-				  <li>Port: 389</li>
-				  <li>Server base: [DC=domain,DC=name]</li>
-				  <li>Bind as user: [user@domain.name]</li>
-				  <li>Bind with password: [password for that user]</li>
-				  <li>User search base: CN=Users</li>
-				  <li>User search filter: sAMAccountName={0}</li>
-				  <li>User name attribute: sAMAccountName</li>
-				  <li>Group search base: CN=Users</li>
-				  <li>Group search filter: (member:1.2.840.113556.1.4.1941:={0})</li>
-				  <li>Group name attribute: sAMAccountName</li>
-				  <li>Member attribute is DN: yes (tick the checkbox)</li>
-				</ul>
-				<p><code>member:1.2.840.113556.1.4.1941:</code> gives you recursive check for nested groups</p>
+                <figure src="images/en_US/solr-job-field-mapping.PNG" alt="Solr Specification, Field Mapping tab" width="80%"/>
+                <br/><br/>
+                <p>Add a new mapping by filling in the "source" with the name of the metadata item from the repository, and "target" as the name of the output field in
+                       Solr, and click the "Add" button.  Leaving the "target" field blank will result in all metadata items of that name not being sent to Solr.</p>
             </section>
 
-            <section id="livelinkauthority">
-                <title>OpenText LiveLink Authority Connection</title>
-                <p>A LiveLink authority connection is needed to enforce security for documents retrieved from LiveLink repositories.</p>
-                <p>In order to function, this connection type needs to be provided with information about the name of the LiveLink server, and credentials appropriate
-                    for retrieving a user's ACLs from that machine.  Since LiveLink operates with its own list of users, you may also want to specify a rule-based
-                    mapping between an Active Directory user and the corresponding LiveLink user.  The authority type allows you to specify such a mapping using
-                    regular expressions.</p>
-                <p>A LiveLink authority connection has three special tabs you will need to configure: the "Server" tab, the "User Mapping" tab, and the "Cache" tab.</p>
-                <p>The "Server" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/livelink-authority-server.PNG" alt="LiveLink Authority, Server tab" width="80%"/>
-                <br/><br/>
-                <p>Select the manner you want the connection to use to communicate with LiveLink.  Your options are:</p>
-                <ul>
-                  <li>Internal (native LiveLink protocol)</li>
-                  <li>HTTP (communication with LiveLink through the IIS web server)</li>
-                  <li>HTTPS (communication with LiveLink through IIS using SSL)</li>
-                </ul>
-                <p>Also, you need to enter the name of the desired LiveLink server, the LiveLink port, and the LiveLink server credentials.  If you have selected communication
-                    using HTTP or HTTPS, you must provide a relative CGI path to your LiveLink.  You may also need to provide web server credentials.  Basic authentication
-                    and older forms of NTLM are supported.  In order to use NTLM, specify a non-blank server domain name in the "Server HTTP domain" field, plus a non-
-                    qualified user name and password.  If basic authentication is desired, leave the "Server HTTP domain" field blank, and provide basic auth credentials in the
-                    "Server HTTP NTLM user name" and "Server HTTP NTLM password" fields.  For no web server authentication, leave these fields all blank.</p>
-                <p>For communication using HTTPS, you will also need to upload your authority certificate(s) on the "Server" tab, to tell the connection which certificates to
-                    trust.  Upload your certificate using the browse button, and then click the "Add" button to add it to the trust store.</p>
-                <p>The "User Mapping" tab looks like this:</p>
+
+        </section>
+
+        <section id="mappingconnectiontypes">
+            <title>User Mapping Connection Types</title>
+            
+            <section id="regexpmapper">
+                <title>Regular Expression User Mapping Connection</title>
+                <p>The Regular Expression user mapping connection type is very helpful for rote user name conversions of all sorts.  For example, it can easily be configured to map the standard "user@domain" form
+                       of an Active Directory user name to (say) a LiveLink equivalent, e.g. "domain\user".  Since many repositories establish such rote conversions, the Regular Expression user mapping connection
+                       type is often all that you will ever need.</p>
+                <br/>
+                <p>A Regular Expression user mapping connection type has one special tab in the user mapping connection editing screen: "User Mapping".  This
+                       tab looks like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/livelink-authority-user-mapping.PNG" alt="LiveLink Authority, User Mapping tab" width="80%"/>
+                <figure src="images/en_US/regexp-mapping-user-mapping.PNG" alt="Regexp User Mapping, User Mapping tab" width="80%"/>
                 <br/><br/>
-                <p>The purpose of the "User Mapping" tab is to allow you to map the incoming user name and domain (usually from Active Directory) to its LiveLink equivalent.
-                      This tab predates the addition of the general user mapping functionality, and is provided only for backwards-compatibility reasons.  Please create a regular
-                      expression mapper instead.</p>
                 <p>The mapping consists of a match expression, which is a regular expression where parentheses ("(" and ")") mark sections you are interested in, and a
                        replace string.  The sections marked with parentheses are called "groups" in regular expression parlance.  The replace string consists of constant text plus
                        substitutions of the groups from the match, perhaps modified.  For example, "$(1)" refers to the first group within the match, while "$(1l)" refers to the first
                        match group mapped to lower case.  Similarly, "$(1u)" refers to the same characters, but mapped to upper case.</p>
                 <p>For example, a match expression of <code>^(.*)\@([A-Z|a-z|0-9|_|-]*)\.(.*)$</code> with a replace string of <code>$(2)\$(1l)</code> would convert
-                      an Active Directory username of <code>MyUserName@subdomain.domain.com</code> into the LiveLink user name
+                      an Active Directory username of <code>MyUserName@subdomain.domain.com</code> into the user name
                       <code>subdomain\myusername</code>.</p>
-                <p>The "Cache" tab allows you to configure how the authority connection keeps an individual user's information around:</p>
-                <br/><br/>
-                <figure src="images/en_US/livelink-authority-cache.PNG" alt="LiveLink Authority, Cache tab" width="80%"/>
-                <br/><br/>
-                <p>Here you set the time a user's information is kept around (the "Cache lifetime" field), and how many simultaneous users have their information cached
-                      (the "Cache LRU size" field).</p>
-                <p>When you are done, click the "Save" button.  You will then see a summary and status for the authority connection:</p>
+                <p>When you are done, click the "Save" button.  When you do, a connection summary and status screen will be presented, which may look something like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/livelink-authority-status.PNG" alt="LiveLink Authority Status" width="80%"/>
+                <figure src="images/en_US/regexp-mapping-status.PNG" alt="Regexp User Mapping Status" width="80%"/>
                 <br/><br/>
-                <p>We suggest that you examine the status carefully and correct any reported errors before proceeding.  Note that in this example, the LiveLink server would
-                    not accept connections, which is leading to an error status message instead of "Connection working".</p>
             </section>
+        </section>
+
+        <section id="authorityconnectiontypes">
+            <title>Authority Connection Types</title>
             
-            <section id="jdbcauthority">
-                <title>Generic Database Authority Connection</title>
-                <p>The generic database connection type allows you to generate access tokens from a database table, served by one of the following databases:</p>
-                <br/>
-                <ul>
-                    <li>Postgresql (via a Postgresql JDBC driver)</li>
-                    <li>SQL Server (via the JTDS JDBC driver)</li>
-                    <li>Oracle (via the Oracle JDBC driver)</li>
-                    <li>Sybase (via the JTDS JDBC driver)</li>
-                    <li>MySQL (via the MySQL JDBC driver)</li>
-                </ul>
+            <section id="adauthority">
+                <title>Active Directory Authority Connection</title>
+                <p>An active directory authority connection is essential for enforcing security for documents from Windows shares, Microsoft SharePoint (in ActiveDirectory mode), and IBM FileNet repositories.
+                       This connection type needs to be provided with information about how to log into an appropriate Windows domain controller, with a user that has sufficient privileges to
+                       be able to look up any user's ID and group relationships.</p>
                 <br/>
-                <p>This connection type <b>cannot</b> be configured to work with other databases than the ones listed above without software changes.  Depending on your particular installation,
-                       some of the above options may not be available.</p>
-                <p>A generic database authority connection has four special tabs on the repository connection editing screen: the "Database Type" tab, the "Server" tab,
-                      the "Credentials" tab, and the "Queries" tab.  The "Database Type" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/jdbc-authority-configure-database-type.PNG" alt="Generic Database Authority Connection, Database Type tab" width="80%"/>
-                <br/><br/>
-                <p>Select the kind of database you want to connect to, from the pulldown.</p>
-                <p>Also, select the JDBC access method you want from the access method pulldown.  The access method is provided because the JDBC specification has been
-                    recently clarified, and not all JDBC drivers work the same way as far as resultset column name discovery is concerned.  The "by name" option currently works
-                    with all JDBC drivers in the list except for the MySQL driver.  The "by label" works for the current MySQL driver, and may work for some of the others as well.  If
-                    the queries you supply for your generic database jobs do not work correctly, and you see an error message about not being able to find required columns in the
-                    result, you can change your selection on this pulldown and it may correct the problem.</p>
-                <p>The "Server" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/jdbc-authority-configure-server.PNG" alt="Generic Database Authority Connection, Server tab" width="80%"/>
-                <br/><br/>
-                <p>Here you have a choice.  <strong>Either</strong> you can choose to specify the database host and port, and the database name or instance name,
-                      <strong>or</strong> you can provide a raw JDBC connection string that is appropriate for the database type you have chosen.  This latter option
-                      is provided because many JDBC drivers, such as Oracle's, now can connect to an entire cluster of Oracle servers if you specify the appropriate
-                      connection description string.</p>
-                <p>If you choose the second option, just consult your JDBC driver's documentation and supply your string.  If there is anything entered in the raw connection
-                      string field at all, it will take precedence over the database host and database name fields.</p>
-                <p>If you choose the first option, the server name and port must be provided in the "Database host and port" field.  For example, for Oracle, the standard
-                      Oracle installation uses port 1521, so you would enter something like, "my-oracle-server:1521" for this field.  Postgresql uses port 5432 by default, so
-                      "my-postgresql-server:5432" would be required.  SQL Server's standard port is 1433, so use "my-sql-server:1433".</p>
-                <p>The service name or instance name field describes which instance and database to connect to.  For Oracle or Postgresql, provide just the database name.
-                      For SQL Server, use "my-instance-name/my-database-name".  For SQL Server using the default instance, use just the database name.</p>
-                <p>The "Credentials" tab is straightforward:</p>
+                <p>An Active Directory authority connection type has two special tabs in the authority connection editing screen: "Domain Controller", and "Cache".  The "Domain Controller"
+                       tab looks like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/jdbc-authority-configure-credentials.PNG" alt="Generic Database Authority Connection, Credentials tab" width="80%"/>
+                <figure src="images/en_US/ad-configure-dc.PNG" alt="AD Configuration, Domain Controller tab" width="80%"/>
                 <br/><br/>
-                <p>Enter the database user credentials.</p>
-                <p>The "Queries" tab looks like this:</p>
+                <p>As you can see, the Active Directory authority allows you to configure multiple connections to different, but presumably related, domain controllers.  The choice of
+                       which domain controller will be accessed is determined by traversing the list of configured domain controllers from top to bottom, and finding the first one that
+                       matches the domain suffix field specified.  Note that a blank value for the domain suffix will match <strong>all</strong> users.</p>
+                <p>To add a domain controller to the end of the list, fill in the requested values.  Note that the "Administrative user name" field usually requires no domain suffix, but
+                       depending on the details of how the domain controller is configured, may sometimes only accept the "name@domain" format.  When you have completed your
+                       entry, click the "Add to end" button to add the domain controller rule to the end of the list.  Later, when other domain controllers are present in the list, you can
+                       click a different button at an appropriate spot to insert the domain controller record into the list where you want it to go.</p>
+                <p>The Active Directory authority connection type also has a "Cache" tab, for managing the caching of individual user responses:</p>
                 <br/><br/>
-                <figure src="images/en_US/jdbc-authority-configure-queries.PNG" alt="Generic Database Authority Connection, Queries tab" width="80%"/>
+                <figure src="images/en_US/ad-configure-cache.PNG" alt="AD Configuration, Cache tab" width="80%"/>
                 <br/><br/>
-                <p>Here you supply two queries.  The first query looks up the user name to find a user id.  The second query looks up access tokens corresponding to the
-                      user id.  Details of what you supply for these queries will depend on your database schema.</p>
-                <p>After you click the "Save" button, you will see a connection summary screen, which might look something like this:</p>
+                <p>Here you can control how many individual users will be cached, and for how long.</p>
+                <p>When you are done, click the "Save" button.  When you do, a connection summary and status screen will be presented, which may look something like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/jdbc-authority-status.PNG" alt="Generic Database Authority Status" width="80%"/>
+                <figure src="images/en_US/ad-status.PNG" alt="AD Status" width="80%"/>
                 <br/><br/>
-                <p>Note that in this example, the generic database authority connection is not properly authenticated, which is leading to an error status message instead
-                      of "Connection working".</p>
+                <p>Note that in this example, the Active Directory connection is not responding, which is leading to an error status message instead of "Connection working".</p>
+            </section>
+
+            <section id="cmisauthority">
+              <title>CMIS Authority Connection</title>
+              <p>A CMIS authority connection is required for enforcing security for documents retrieved from CMIS repositories.</p>
+              <p>The CMIS specification includes the concept of authorities only depending on a specific document, this authority connector is only based on a regular expression comparator.</p>
+              <p>A CMIS authority connection has the following special tabs you will need to configure: the "Repository" tab and the "User Mapping" tab. The "Repository" tab looks like this:</p>
+              <br/><br/>
+              <figure src="images/en_US/cmis-authority-connection-configuration-repository.png" alt="CMIS Authority, Repository configuration" width="80%"/>
+              <br/><br/>
+              <p>The repository configuration will be only used to track an ID for a specific CMIS repository. No calls will be performed against the CMIS repository.</p>
+              <br/><br/>
+              <p>The second tab that you need to configure is the "User Mapping" tab that allows you to define a regular expression to specify the user mapping.  This tab
+                    predates the addition of user mapping functionality to ManifoldCF.  Please create a user mapping instead.</p>
+              <p>The "User Mapping" tab looks like the following:</p>
+              <br/><br/>
+              <figure src="images/en_US/cmis-authority-connection-configuration-usermapping.png" alt="CMIS Authority, User Mapping configuration" width="80%"/>
+              <br/><br/>
+              <p>The purpose of the "User Mapping" tab is to allow you to map the incoming user name and domain (usually from Active Directory) to its CMIS user equivalent.
+                     The mapping consists of a match expression, which is a regular expression where parentheses ("("
+                     and ")") mark sections you are interested in, and a replace string.  The sections marked with parentheses are called "groups" in regular expression parlance.  The replace string consists of constant text plus
+                     substitutions of the groups from the match, perhaps modified.  For example, "$(1)" refers to the first group within the match, while "$(1l)" refers to the first match group
+                     mapped to lower case.  Similarly, "$(1u)" refers to the same characters, but mapped to upper case.</p>
+              <p>For example, a match expression of <code>^(.*)\@([A-Z|a-z|0-9|_|-]*)\.(.*)$</code> with a replace string of <code>$(2)\$(1l)</code> would convert an
+                   Active Directory username of <code>MyUserName@subdomain.domain.com</code> into the CMIS user name <code>subdomain\myusername</code>.</p>
+              <p>When you are done, click the "Save" button.  You will then see a summary and status for the authority connection:</p>
+              <br/><br/>
+              <figure src="images/en_US/cmis-authority-connection-configuration-save.png" alt="CMIS Authority, saving configuration" width="80%"/>
+              <br/><br/>
             </section>
 
             <section id="documentumauthority">
@@ -1030,93 +908,7 @@ curl -XGET http://localhost:9200/index/_
                 <p>Pay careful attention to the status, and be prepared to correct any
                     problems that are displayed.</p>
             </section>
-            
-            <section id="meridioauthority">
-                <title>Autonomy Meridio Authority Connection</title>
-                <p>A Meridio authority connection is required for enforcing security for documents retrieved from Meridio repositories.</p>
-                <p>This connection type needs to be provided with information about what Document Server to connect to, what Records Server to connect to, and what User Service Server
-                    to connect to.  Also needed are the Meridio credentials that should be used to retrieve a user's ACLs from those machines.</p>
-                <p>Note that the User Service is part of the Meridio Authority, and must be installed somewhere in the Meridio system in order for the Meridio Authority to function correctly.
-                    If you do not know whether this has yet been done, or on what server, please ask your system administrator.</p>
-                <p>A Meridio authority connection has the following special tabs you will need to configure: the "Document Server" tab, the "Records Server" tab, the "User Service Server" tab,
-                    and the "Credentials" tab.  The "Document Server" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/meridio-authority-document-server.PNG" alt="Meridio Authority, Document Server tab" width="80%"/>
-                <br/><br/>
-                <p>Select the correct protocol, and enter the correct server name, port, and location to reference the Meridio document server services.  If a proxy is involved, enter the proxy host
-                    and port.  Authenticated proxies are not supported by this connection type at this time.</p>
-                <p>Note that, in the Meridio system, while it is possible that different services run on different servers, this is not typically the case.  The connection type, on the other hand, makes
-                    no assumptions, and permits the most general configuration.</p>
-                <p>The "Records Server" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/meridio-authority-records-server.PNG" alt="Meridio Authority, Records Server tab" width="80%"/>
-                <br/><br/>
-                <p>Select the correct protocol, and enter the correct server name, port, and location to reference the Meridio records server services.  If a proxy is involved, enter the proxy host
-                    and port.  Authenticated proxies are not supported by this connection type at this time.</p>
-                <p>Note that, in the Meridio system, while it is possible that different services run on different servers, this is not typically the case.  The connection type, on the other hand, makes
-                    no assumptions, and permits the most general configuration.</p>
-                <p>The "User Service Server" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/meridio-authority-user-service-server.PNG" alt="Meridio Authority, User Service Server tab" width="80%"/>
-                <br/><br/>
-                <p>You will require knowledge of where the special Meridio Authority extensions have been installed in order to fill out this tab.</p>
-                <p>Select the correct protocol, and enter the correct server name, port, and location to reference the Meridio user service server services.  If a proxy is involved, enter the proxy host
-                    and port.  Authenticated proxies are not supported by this connection type at this time.</p>
-                <p>Note that, in the Meridio system, while it is possible that different services run on different servers, this is not typically the case.  The connection type, on the other hand, makes
-                    no assumptions, and permits the most general configuration.</p>
-                <p>The "Credentials" tab looks like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/meridio-authority-credentials.PNG" alt="Meridio Authority, Credentials tab" width="80%"/>
-                <br/><br/>
-                <p>Enter the Meridio server credentials needed to access the Meridio system.</p>
-                <p>When you are done, click the "Save" button.  You will then see a screen looking something like this:</p>
-                <br/><br/>
-                <figure src="images/en_US/meridio-authority-status.PNG" alt="Meridio Authority Status" width="80%"/>
-                <br/><br/>
-                <p>In this example, logon has not succeeded because the server on which the Meridio Authority is running is unknown to the Windows domain under which Meridio is running.
-                    This results in an error message, instead of the "Connection working" message that you would see if the authority was working properly.</p>
-                <p>Since Meridio uses Windows IIS for authentication, there are many ways in which the configuration of either IIS or the Windows domain under which Meridio runs can affect
-                    the correct functioning of the Meridio Authority.  It is beyond the scope of this manual to describe the kinds of analysis and debugging techniques that might be required to diagnose connection
-                    and authentication problems.  If you have trouble, you will almost certainly need to involve your Meridio IT personnel.  Debugging tools may include (but are not limited to):</p>
-                <br/>
-                <ul>
-                    <li>Windows security event logs</li>
-                    <li>ManifoldCF logs (see below)</li>
-                    <li>Packet captures (using a tool such as WireShark)</li>
-                </ul>
-                <br/>
-                <p>If you need specific ManifoldCF logging information, contact your system integrator.</p>
-            </section>
-            
-            <section id="cmisauthority">
-              <title>CMIS Authority Connection</title>
-              <p>A CMIS authority connection is required for enforcing security for documents retrieved from CMIS repositories.</p>
-              <p>The CMIS specification includes the concept of authorities only depending on a specific document, this authority connector is only based on a regular expression comparator.</p>
-              <p>A CMIS authority connection has the following special tabs you will need to configure: the "Repository" tab and the "User Mapping" tab. The "Repository" tab looks like this:</p>
-              <br/><br/>
-              <figure src="images/en_US/cmis-authority-connection-configuration-repository.png" alt="CMIS Authority, Repository configuration" width="80%"/>
-              <br/><br/>
-              <p>The repository configuration will be only used to track an ID for a specific CMIS repository. No calls will be performed against the CMIS repository.</p>
-              <br/><br/>
-              <p>The second tab that you need to configure is the "User Mapping" tab that allows you to define a regular expression to specify the user mapping.  This tab
-                    predates the addition of user mapping functionality to ManifoldCF.  Please create a user mapping instead.</p>
-              <p>The "User Mapping" tab looks like the following:</p>
-              <br/><br/>
-              <figure src="images/en_US/cmis-authority-connection-configuration-usermapping.png" alt="CMIS Authority, User Mapping configuration" width="80%"/>
-              <br/><br/>
-              <p>The purpose of the "User Mapping" tab is to allow you to map the incoming user name and domain (usually from Active Directory) to its CMIS user equivalent.
-                     The mapping consists of a match expression, which is a regular expression where parentheses ("("
-                     and ")") mark sections you are interested in, and a replace string.  The sections marked with parentheses are called "groups" in regular expression parlance.  The replace string consists of constant text plus
-                     substitutions of the groups from the match, perhaps modified.  For example, "$(1)" refers to the first group within the match, while "$(1l)" refers to the first match group
-                     mapped to lower case.  Similarly, "$(1u)" refers to the same characters, but mapped to upper case.</p>
-              <p>For example, a match expression of <code>^(.*)\@([A-Z|a-z|0-9|_|-]*)\.(.*)$</code> with a replace string of <code>$(2)\$(1l)</code> would convert an
-                   Active Directory username of <code>MyUserName@subdomain.domain.com</code> into the CMIS user name <code>subdomain\myusername</code>.</p>
-              <p>When you are done, click the "Save" button.  You will then see a summary and status for the authority connection:</p>
-              <br/><br/>
-              <figure src="images/en_US/cmis-authority-connection-configuration-save.png" alt="CMIS Authority, saving configuration" width="80%"/>
-              <br/><br/>
-            </section>
-            
+
              <section id="genericauthority">
               <title>Generic Authority</title>
               <p>Generic authority is intended to be used with Generic Connector and provide authentication tokens based on generic API. The idea is that you can use it and implement only the API which is designed
@@ -1151,636 +943,610 @@ curl -XGET http://localhost:9200/index/_
               <p><code>exists</code> attribute is required and it carries information whether user is valid or not.</p>
               <br/><br/>
             </section>
-       </section>
-        
-        <section id="repositoryconnectiontypes">
-            <title>Repository Connection Types</title>
 
-            <section id="filesystemrepository">
-                <title>Generic WGET-Compatible File System Repository Connection</title>
-                <p>The generic file system repository connection type was developed in part as an example, demonstration, and testing tool, which reads simple
-                       files in directory paths, and partly as ManifoldCF support for the Unix utility called <em>wget</em>.  In the latter mode, the File System Repository Connector
-                       will parse file names that were created by <em>wget</em>, or by the wget-compatible File System Output Connector, and turn these back
-                       into full URL's to external web content.</p>
-                <p>This connection type has no support for any kind of document security, except for hand-entered access tokens provided on a per-job basis.</p>
-                <p>The File System repository connection type provides no configuration tabs beyond the standard ones.  However, please consider setting a "Maximum connections per
-                       JVM" value on the "Throttling" tab to at least one per worker thread, or 30, for best performance.</p>
-                <p>Jobs created using a file-system-type repository connection
-                       have two tabs in addition to the standard repertoire: the "Hop Filters" tab, and the "Repository Paths" tab.</p>
-                <p>The "Hop Filters" tab allows you to restrict the document set by the number of child hops from the path root.  While this is not terribly interesting in the case of a file
-                       system, the same basic functionality is also used in the Web connection type, where it is a more important feature.  The file system connection type gives you a way to see
-                       how this feature works, in a more predictable environment:</p>
+            <section id="jdbcauthority">
+                <title>Generic Database Authority Connection</title>
+                <p>The generic database connection type allows you to generate access tokens from a database table, served by one of the following databases:</p>
+                <br/>
+                <ul>
+                    <li>Postgresql (via a Postgresql JDBC driver)</li>
+                    <li>SQL Server (via the JTDS JDBC driver)</li>
+                    <li>Oracle (via the Oracle JDBC driver)</li>
+                    <li>Sybase (via the JTDS JDBC driver)</li>
+                    <li>MySQL (via the MySQL JDBC driver)</li>
+                </ul>
+                <br/>
+                <p>This connection type <b>cannot</b> be configured to work with other databases than the ones listed above without software changes.  Depending on your particular installation,
+                       some of the above options may not be available.</p>
+                <p>A generic database authority connection has four special tabs on the repository connection editing screen: the "Database Type" tab, the "Server" tab,
+                      the "Credentials" tab, and the "Queries" tab.  The "Database Type" tab looks like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/filesystem-job-hopcount.PNG" alt="File System Connection, Hop Filters tab" width="80%"/>
+                <figure src="images/en_US/jdbc-authority-configure-database-type.PNG" alt="Generic Database Authority Connection, Database Type tab" width="80%"/>
                 <br/><br/>
-                <p>In the case of the File System connection type, there is only one variety of relationship between documents, which is called a "child" relationship.  If you want to
-                       restrict the document set by how far away a document is from the path root, enter the maximum allowed number of hops in the text box.  Leaving the box blank
-                       indicates that no such filtering will take place.</p>
-                <p>On this same tab, you can tell the Framework what to do should there be changes in the distance from the root to a document.  The choice "Delete unreachable
-                       documents" requires the Framework to recalculate the distance to every potentially affected document whenever a change takes place.  This may require
-                       expensive bookkeeping, however, so you also have the option of  ignoring such changes.  There are two varieties of this latter option - you can ignore the changes
-                       for now, with the option of turning back on the aggressive bookkeeping at a later time, or you can decide not to ever allow changes to propagate, in which case
-                       the Framework will discard the necessary bookkeeping information permanently.</p>
-                <p>The "Repository Paths" tab looks like this:</p>
+                <p>Select the kind of database you want to connect to, from the pulldown.</p>
+                <p>Also, select the JDBC access method you want from the access method pulldown.  The access method is provided because the JDBC specification has been
+                    recently clarified, and not all JDBC drivers work the same way as far as resultset column name discovery is concerned.  The "by name" option currently works
+                    with all JDBC drivers in the list except for the MySQL driver.  The "by label" works for the current MySQL driver, and may work for some of the others as well.  If
+                    the queries you supply for your generic database jobs do not work correctly, and you see an error message about not being able to find required columns in the
+                    result, you can change your selection on this pulldown and it may correct the problem.</p>
+                <p>The "Server" tab looks like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/filesystem-job-paths.PNG" alt="File System Connection, Repository Paths tab" width="80%"/>
+                <figure src="images/en_US/jdbc-authority-configure-server.PNG" alt="Generic Database Authority Connection, Server tab" width="80%"/>
                 <br/><br/>
-                <p>This tab allows you to type in a set of paths which function as the roots of the crawl.  For each desired path, type in the path, select whether the root should
-                       behave as an WGET repository or not, and click the "Add" button to add it to the list.  The form of the path you type in obviously needs to be meaningful
-                       for the operating system the Framework is running on.</p>
-                <p>Each root path has a set of rules which determines whether a document is included or not in the set for the job.  Once you have added the root path to the list, you
-                       may then add rules to it.  Each rule has a match expression, an indication of whether the rule is intended to match files or directories, and an action (include or exclude).
-                       Rules are evaluated from top to bottom, and the first rule that matches the file name is the one that is chosen.  To add a rule, select the desired pulldowns, type in 
-                       a match file specification (e.g. "*.txt"), and click the "Add" button.</p>
-            </section>
-            
-            <section id="hdfsrepository">
-                <title>HDFS Repository Connection (WGET compatible)</title>
-                <p>The HDFS repository connection operates much like the File System Repository Connection, except it reads data from the Hadoop File System rather than a
-                       local disk.  It, too, is capable of understanding directories written in the manner of the Unix utility called <em>wget</em>.  In the latter mode, the HDFS Repository Connector
-                       will parse file names that were created by <em>wget</em>, or by the wget-compatible HDFS Output Connector, and turn these back
-                       into full URL's pointing to external web content.</p>
-                <p>This connection type has no support for any kind of document security, except for hand-entered access tokens provided on a per-job basis.</p>
-                <p>The HDFS repository connection type has an additional configuration tab above and beyond the standard ones, called "Server".  This is what it looks like:</p>
+                <p>Here you have a choice.  <strong>Either</strong> you can choose to specify the database host and port, and the database name or instance name,
+                      <strong>or</strong> you can provide a raw JDBC connection string that is appropriate for the database type you have chosen.  This latter option
+                      is provided because many JDBC drivers, such as Oracle's, now can connect to an entire cluster of Oracle servers if you specify the appropriate
+                      connection description string.</p>
+                <p>If you choose the second option, just consult your JDBC driver's documentation and supply your string.  If there is anything entered in the raw connection
+                      string field at all, it will take precedence over the database host and database name fields.</p>
+                <p>If you choose the first option, the server name and port must be provided in the "Database host and port" field.  For example, for Oracle, the standard
+                      Oracle installation uses port 1521, so you would enter something like, "my-oracle-server:1521" for this field.  Postgresql uses port 5432 by default, so
+                      "my-postgresql-server:5432" would be required.  SQL Server's standard port is 1433, so use "my-sql-server:1433".</p>
+                <p>The service name or instance name field describes which instance and database to connect to.  For Oracle or Postgresql, provide just the database name.
+                      For SQL Server, use "my-instance-name/my-database-name".  For SQL Server using the default instance, use just the database name.</p>
+                <p>The "Credentials" tab is straightforward:</p>
                 <br/><br/>
-                <figure src="images/en_US/hdfs-repository-configure-server.PNG" alt="HDFS Connection, Server tab" width="80%"/>
+                <figure src="images/en_US/jdbc-authority-configure-credentials.PNG" alt="Generic Database Authority Connection, Credentials tab" width="80%"/>
                 <br/><br/>
-                <p>Enter the HDFS name node URI, and the user name, and click the "Save" button.</p>
-                <p>Jobs created using an HDFS repository connection type
-                       have two tabs in addition to the standard repertoire: the "Hop Filters" tab, and the "Repository Paths" tab.</p>
-                <p>The "Hop Filters" tab allows you to restrict the document set by the number of child hops from the path root.  This is what it looks like:</p>
+                <p>Enter the database user credentials.</p>
+                <p>The "Queries" tab looks like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/hdfs-job-hopcount.PNG" alt="HDFS Connection, Hop Filters tab" width="80%"/>
+                <figure src="images/en_US/jdbc-authority-configure-queries.PNG" alt="Generic Database Authority Connection, Queries tab" width="80%"/>
                 <br/><br/>
-                <p>In the case of the HDFS connection type, there is only one variety of relationship between documents, which is called a "child" relationship.  If you want to
-                       restrict the document set by how far away a document is from the path root, enter the maximum allowed number of hops in the text box.  Leaving the box blank
-                       indicates that no such filtering will take place.</p>
-                <p>On this same tab, you can tell the Framework what to do should there be changes in the distance from the root to a document.  The choice "Delete unreachable
-                       documents" requires the Framework to recalculate the distance to every potentially affected document whenever a change takes place.  This may require
-                       expensive bookkeeping, however, so you also have the option of  ignoring such changes.  There are two varieties of this latter option - you can ignore the changes
-                       for now, with the option of turning back on the aggressive bookkeeping at a later time, or you can decide not to ever allow changes to propagate, in which case
-                       the Framework will discard the necessary bookkeeping information permanently.</p>
-                <p>The "Repository Paths" tab looks like this:</p>
+                <p>Here you supply two queries.  The first query looks up the user name to find a user id.  The second query looks up access tokens corresponding to the
+                      user id.  Details of what you supply for these queries will depend on your database schema.</p>
+                <p>After you click the "Save" button, you will see a connection summary screen, which might look something like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/hdfs-job-paths.PNG" alt="HDFS Connection, Repository Paths tab" width="80%"/>
+                <figure src="images/en_US/jdbc-authority-status.PNG" alt="Generic Database Authority Status" width="80%"/>
                 <br/><br/>
-                <p>This tab allows you to type in a set of paths which function as the roots of the crawl.  For each desired path, type in the path, select whether the root should
-                       behave as an WGET repository or not, and click the "Add" button to add it to the list.</p>
-                <p>Each root path has a set of rules which determines whether a document is included or not in the set for the job.  Once you have added the root path to the list, you
-                       may then add rules to it.  Each rule has a match expression, an indication of whether the rule is intended to match files or directories, and an action (include or exclude).
-                       Rules are evaluated from top to bottom, and the first rule that matches the file name is the one that is chosen.  To add a rule, select the desired pulldowns, type in 
-                       a match file specification (e.g. "*.txt"), and click the "Add" button.</p>
+                <p>Note that in this example, the generic database authority connection is not properly authenticated, which is leading to an error status message instead
+                      of "Connection working".</p>
             </section>
 
-            <section id="rssrepository">
-                <title>Generic RSS Repository Connection</title>
-                <p>The RSS connection type is specifically designed to crawl RSS feeds.  While the Web connection type can also extract links from RSS feeds, the RSS connection type
-                       differs in the following ways:</p>
-                <br/>
-                <ul>
-                    <li>Links are <b>only</b> extracted from feeds</li>
-                    <li>Feeds themselves are not indexed</li>
-                    <li>There is fine-grained control over how often feeds are refetched, and they are treated distinctly from documents in this regard</li>
-                    <li>The RSS connection type knows how to carry certain data down from the feeds to individual documents, as metadata</li>
-                </ul>
-                <br/>
-                <p>Many users of the RSS connection type set up their jobs to run continuously, configuring their jobs to never refetch documents, but rather to expire them after some 30 days.
-                       This model works reasonably well for news, which is what RSS is often used for.</p>
-                <p>This connection type has no support for any kind of document security, except for hand-entered access tokens provided on a per-job basis.</p>
-                <p>An RSS connection has the following special tabs: "Email", "Robots", "Bandwidth", and "Proxy".  The "Email" tab looks like this:</p>
+
+            <section id="ldapauthority">
+                <title>LDAP Authority Connection</title>
+                <p>An LDAP authority connection can be used to provide document security in situations where there is no native document security
+                      model in place.  Examples include Samba shares, Wiki pages,  RSS feeds, etc.</p>
+                <p>The LDAP authority works by providing user or group names from an LDAP server as access tokens.  These access tokens can
+                      be used by any repository connection type that provides for access tokens entered on a per-job basis, or by the JCIFs connection type,
+                      which has explicit user/group name support built in, meant for Samba shares.</p>
+                <p>This connection type needs to be provided with information about how to log into an appropriate LDAP server, as well as search
+                      expressions needed to look up user and group records.  An active directory authority connection type has a single special tab in the
+                      authority connection editing screen: the "LDAP" tab:</p>
                 <br/><br/>
-                <figure src="images/en_US/rss-configure-email.PNG" alt="RSS Connection, Email tab" width="80%"/>
+                <figure src="images/en_US/ldap-configure-ldap.PNG" alt="LDAP Configuration, LDAP tab" width="80%"/>
                 <br/><br/>
-                <p>Enter an email address.  This email address will be included in all requests made by the RSS connection, so that webmasters can report any difficulties that their
-                       sites experience as the result of improper throttling, etc.</p>
-                <p>This field is mandatory.  While an RSS connection makes no effort to validate the correctness of the email
-                       field, you will probably want to remain a good web citizen and provide a valid email address.  Remember that it is very easy for a webmaster to block access to
-                       a crawler that does not seem to be behaving in a polite manner.</p>
-                <p>The "Robots" tab looks like this:</p>
+                <p>Fill in the requested values.  Note that the "Server base" field contains the LDAP domain specification you want to search.  For
+                      example, if you have an LDAP domain for "people.myorg.com", the server based might be "dc=com,dc=myorg,dc=people".</p>
+                <p>When you are done, click the "Save" button.  When you do, a connection
+                       summary and status screen will be presented, which
+                       may look something like this:</p>
                 <br/><br/>
-                <figure src="images/en_US/rss-configure-robots.PNG" alt="RSS Connection, Robots tab" width="80%"/>
+                <figure src="images/en_US/ldap-status.PNG" alt="LDAP Status" width="80%"/>
                 <br/><br/>
-                <p>Select how the connection will interpret robots.txt.  Remember that you have an interest in crawling people's sites as politely as is possible.</p>
-                <p>The "Bandwidth" tab looks like this:</p>
+                <p>Note that in this example, the LDAP connection is not responding, which is leading to an error status message instead of "Connection working".</p>
                 <br/><br/>
-                <figure src="images/en_US/rss-configure-bandwidth.PNG" alt="RSS Connection, Bandwidth tab" width="80%"/>
+				<p>Example configuration for ActiveDirectory server to fetch user groups:</p>
+				<ul>
+				  <li>Server: [xxx.yyy.zzz.ttt]</li>
+				  <li>Port: 389</li>
+				  <li>Server base: [DC=domain,DC=name]</li>
+				  <li>Bind as user: [user@domain.name]</li>
+				  <li>Bind with password: [password for that user]</li>
+				  <li>User search base: CN=Users</li>
+				  <li>User search filter: sAMAccountName={0}</li>
+				  <li>User name attribute: sAMAccountName</li>
+				  <li>Group search base: CN=Users</li>
+				  <li>Group search filter: (member:1.2.840.113556.1.4.1941:={0})</li>
+				  <li>Group name attribute: sAMAccountName</li>
+				  <li>Member attribute is DN: yes (tick the checkbox)</li>
+				</ul>
+				<p><code>member:1.2.840.113556.1.4.1941:</code> gives you recursive check for nested groups</p>
+            </section>
+
+            <section id="livelinkauthority">
+                <title>OpenText LiveLink Authority Connection</title>
+                <p>A LiveLink authority connection is needed to enforce security for documents retrieved from LiveLink repositories.</p>
+                <p>In order to function, this connection type needs to be provided with information about the name of the LiveLink server, and credentials appropriate
+                    for retrieving a user's ACLs from that machine.  Since LiveLink operates with its own list of users, you may also want to specify a rule-based
+                    mapping between an Active Directory user and the corresponding LiveLink user.  The authority type allows you to specify such a mapping using
+                    regular expressions.</p>
+                <p>A LiveLink authority connection has three special tabs you will need to configure: the "Server" tab, the "User Mapping" tab, and the "Cache" tab.</p>
+                <p>The "Server" tab looks like this:</p>
                 <br/><br/>
-                <p>This tab allows you to control the <b>maximum</b> rate at which the connection fetches data, on a per-server basis, as well as the <b>maximum</b> fetches
-                       per minute, also per-server.  Finally, the maximum number of socket connections made per server at any one time is also controllable by this tab.</p>
-                <p>The screen shot displays parameters that are considered reasonably polite.  The default values for this table are all blank, meaning that, by default, there is no
-                       throttling whatsoever!  Please do not make the mistake of crawling other people's sites without adequate politeness parameters in place.</p>
-                <p>The "Throttle group" parameter allows you to treat multiple RSS-type connections together, for the purposes of throttling.  All RSS-type connections that
-                       have the same throttle group name will use the same pool for throttling purposes.</p>
-                <p>The "Bandwidth" tab is related to the throttles that you can set on the "Throttling" tab in the following ways:</p>
-                <br/>
+                <figure src="images/en_US/livelink-authority-server.PNG" alt="LiveLink Authority, Server tab" width="80%"/>
+                <br/><br/>
+                <p>Select the manner you want the connection to use to communicate with LiveLink.  Your options are:</p>
                 <ul>
-                    <li>The "Bandwidth" tab sets the <b>maximum</b> values, while the "Throttling" tab sets the <b>average</b> values.</li>
-                    <li>The "Bandwidth" tab does not affect how documents are scheduled in the queue; it simply blocks documents until it is safe to go ahead, which will use up a
-                          crawler thread for the entire period that both the wait and the fetch take place.  The "Throttling" tab affects how often documents are scheduled, so it does
-                          not waste threads.</li>
+                  <li>Internal (native LiveLink protocol)</li>
+                  <li>HTTP (communication with LiveLink through the IIS web server)</li>
+                  <li>HTTPS (communication with LiveLink through IIS using SSL)</li>
                 </ul>
-                <br/>

[... 2109 lines stripped ...]


Mime
View raw message