incubator-connectors-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kwri...@apache.org
Subject svn commit: r934476 [2/2] - in /incubator/lcf/site: publish/ publish/images/ src/documentation/content/xdocs/ src/documentation/resources/images/
Date Thu, 15 Apr 2010 16:35:11 GMT
Propchange: incubator/lcf/site/publish/images/list-repository-connections.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/output-throttling.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/output-throttling.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/output-throttling.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/repository-throttling-with-throttle.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/repository-throttling-with-throttle.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/repository-throttling-with-throttle.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/repository-throttling.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/repository-throttling.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/repository-throttling.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/solr-configure-arguments.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/solr-configure-arguments.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/solr-configure-arguments.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/solr-configure-server.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/solr-configure-server.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/solr-configure-server.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/solr-status.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/solr-status.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/solr-status.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/publish/images/welcome-screen.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/images/welcome-screen.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/images/welcome-screen.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml?rev=934476&r1=934475&r2=934476&view=diff
==============================================================================
--- incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml (original)
+++ incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml Thu Apr
15 16:35:09 2010
@@ -412,17 +412,54 @@
             
             <section id="solroutputconnector">
                 <title>Solr Output Connection</title>
-                <p>More here later</p>
+                <p>The Solr output connection type is designed to allow Lucene Connectors
Framework to submit documents to an appropriate Solr pipeline, via the Solr
+                       HTTP ingestion API.  The configuration parameters are set to the default
Solr values, which can be changed (since Solr's configuration can be changed).
+                       The Solr output connector furthermore makes no judgment as to whether
a given document is indexable or not - it accepts everything, and passes all documents
+                       on to the pipeline, where presumably the configured pipeline will
decide if a document should be rejected or not.  (All of that happens without the Solr connector
+                       being aware of it in any way.)</p>
+                <p>Unfortunately, this lack of specificity comes at a cost.  Unless
you take care to filter documents properly in each job, large movie files or other opaque
+                       content may well be picked up and sent to Solr for indexing, which
will greatly increase the dead load on the overall system.  It is therefore a good idea to
review
+                       all crawls that involve the Solr connector while they are underway,
to be sure there isn't a misconfiguration of this kind.</p>
+                <p>When you create a Solr output connection, two configuration tabs
appear.  The "Server" tab allows you to configure the HTTP target of the connector:</p>
+                <br/><br/>
+                <figure src="images/solr-configure-server.PNG" alt="Solr Configuration,
Server tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the fields according to your Solr configuration.  The Solr
connector supports only basic authentication at this time; if you have this enabled, supply
the credentials
+                       as requested on the bottom part of the form.</p>
+                <p>The second tab is the "Arguments" tab, which allows you to specify
arbitrary arguments to be sent to Solr.  This is a popular way of telling Solr how to handle
+                       specific documents, so the connector allows you to add arguments to
each Solr indexing request:</p>
+                <br/><br/>
+                <figure src="images/solr-configure-arguments.PNG" alt="Solr Configuration,
Arguments tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the argument name and value, and click the "Add" button.
 Bear in mind that if you add an argument with the same name as an existing one, it will replace
the
+                       existing one with the new specified value.  You can delete existing
arguments by clicking the "Delete" button next to the argument you want to delete.</p>
+                <p>When you are done, don't forget to click the "Save" button to save
your changes!  When you do, a connection summary and status screen will be presented, which
+                       may look something like this:</p>
+                <br/><br/>
+                <figure src="images/solr-status.PNG" alt="Solr Status" width="80%"/>
+                <br/><br/>
+                <p>Note that in this example, the Solr connection is not responding,
which is leading to an error status message instead of "Connection working".</p>
+                <p>When you configure a job to use a Solr-type output connection, no
Solr-specific additional tabs appear at this time.  In the future, this may well change, and
job-specific
+                       tabs could be added, most likely to allow job-specific arguments to
be added to each index request.  If functionality of this kind seems important to your application,
+                       please do not hesitate to contact the Lucene Connectors Framework
team with your request.</p>
             </section>
             
             <section id="gtsoutputconnector">
-                <title>GTS Output Connection</title>
+                <title>MetaCarta GTS Output Connection</title>
+                <p>The MetaCarta GTS output connection type is designed to allow Lucene
Connectors Framework to submit documents to an appropriate MetaCarta GTS search
+                       appliance, via the appliance's HTTP Ingestion API.</p>
+                <p>The connector implicitly understands that GTS can only handle text,
HTML, XML, RTF, PDF, and Microsoft Office documents.  All other document types will be
+                       considered to be unindexable.  This helps prevent jobs based on a
GTS-type output connector from fetching data that is large, but of no particular relevance.</p>
+                <p>When you configure a job to use a GTS-type output connection, two
additional tabs will be presented to the user: "Collections" and "Document Templates".  These
+                       tabs allow per-job specification of these GTS-specific features.</p>
                 <p>More here later</p>
             </section>
             
             <section id="nulloutputconnector">
                 <title>Null Output Connection</title>
-                <p>More here later</p>
+                <p>The null output connection type is meant primarily to function as
an aid for people writing repository connectors.  It is not expected to be useful in practice.</p>
+                <p>The null output connection type simply logs indexing and deletion
requests, and does nothing else.  It does not have any special configuration tabs, nor does
it
+                       contribute tabs to jobs defined that use it.</p>
             </section>
             
         </section>
@@ -432,7 +469,28 @@
             
             <section id="adauthority">
                 <title>Active Directory Authority Connection</title>
-                <p>More here later</p>
+                <p>An active directory authority connection is essential for enforcing
security for documents from Microsoft SharePoint, Autonomy Meridio, and IBM FileNet repositories.
+                       The connector needs to be provided with information about how to log
into an appropriate Windows domain controller, with a user that has sufficient privileges
to
+                       be able to look up any user's ID and group relationships.  While the
connector has some known limitations, it should function well for most straightforward Windows
+                       security architecture situations.  The cases in which it may not be
adequate include:</p>
+                <br/>
+                <ul>
+                    <li>when child domains are present</li>
+                    <li>when the expected number of requests per second is fairly high</li>
+                </ul>
+                <br/>
+                <p>The active directory authority connection type provides a single
additional tab to the authority connection editing screen: the "Domain Controller" tab:</p>
+                <br/><br/>
+                <figure src="images/ad-configure-dc.PNG" alt="AD Configuration, Domain
Controller tab" width="80%"/>
+                <br/><br/>
+                <p>Fill in the requested values.  Note that the "Administrative user
name" field usually requires no domain suffix, but depending on the details of how the domain
+                       controller is configured, may sometimes only accept the "name@domain"
format.  When you are done, click the "Save" button.  When you do, a connection
+                       summary and status screen will be presented, which
+                       may look something like this:</p>
+                <br/><br/>
+                <figure src="images/ad-status.PNG" alt="AD Status" width="80%"/>
+                <br/><br/>
+                <p>Note that in this example, the Active Directory connection is not
responding, which is leading to an error status message instead of "Connection working".</p>
             </section>
             
             <section id="livelinkauthority">
@@ -459,11 +517,99 @@
         
         <section id="repositoryconnectiontypes">
             <title>Repository Connection Types</title>
-            <p>More here later</p>
 
             <section id="jcifsrepository">
                 <title>Windows Share/DFS Repository Connection</title>
-                <p>More here later</p>
+                <p>The Windows Share connection type allows you to access content stored
on Windows shares, even from non-Windows systems.  Also supported are Samba and various
+                       third-party Network Attached Storage servers.</p>
+                <p>DFS nodes and referrals are fully supported, provided the referral
machine names can be looked up properly via DNS on the server where the Framework is
+                       running.  For each document, the Windows Share connection type generates
identifiers that can be either "file:" IRI's, or mapped "http:" URI's, depending on how it
is
+                       configured.  This allows for a great deal of flexibility in deployment
environments, but also may require some work to properly set up.</p>
+                <p>In particular, if you intend to use file IRI's as your identifiers,
you should check with your system integrator to be sure these are being handled properly by
the search component of your
+                       system.  When you use a browser such as Internet Explorer to view
a document from a Windows file system called <code>\\servername\sharename\dir1\filename.txt</code>,
+                       the browser converts that to an IRI that looks something like this:
<code>file://///servername/sharename/dir1/filename.txt</code>.
+                       While this seems simple, major complexities arise when the underlying
file name has special characters in it, such as spaces, "#" symbols, or worse still, non-ASCII
+                       characters.  Unfortunately, every version of Internet Explorer handles
these situations somewhat differently, so there is not any fully correct way for the Windows
+                       Share connection type to convert file names to IRI's.  Instead, the
connector always uses a standard canonical form, and expects the search results display system
component to know how to properly form
+                       the right IRI for the browser or client being used.</p>
+                <p>If you are interested in enforcing security for documents crawled
with a Windows Share repository connection type, you will need to first configure an authority
connection
+                       of the Active Directory type to control access to these documents.</p>
+                <p>The Windows Share connection type provides a single additional tab
to the repository connection editing screen: the "Server" tab:</p>
+                <br/><br/>
+                <figure src="images/jcifs-configure-server.PNG" alt="Windows Share Connection,
Server tab" width="80%"/>
+                <br/><br/>
+                <p>You must enter the name of the server to form the connection with
in the "Server" field.  This can either be an actual machine name, or a domain name (if you
intend
+                       to connect to a Windows domain-based DFS root).  If you supply an
actual machine name, it is usually the right thing to do to provide the server name in an
unqualified
+                       form, and provide a fully-qualified domain name in the "Domain name"
field.  The user name also should usually be unqualified, e.g. "Administrator" rather than
+                       "Administrator@mydomain.com".  Sometimes it may work to leave the
"Domain name" field blank, and instead supply a fully-qualified machine name in the "Server"
+                       field.  It never works to supply both a domain name <b>and</b>
a fully-qualified server name.</p>
+                <p>After you click the "Save" button, you will see a connection summary
screen, which might look something like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-status.PNG" alt="Windows Share Status" width="80%"/>
+                <br/><br/>
+                <p>Note that in this example, the Windows Share connection is not responding,
which is leading to an error status message instead of "Connection working".</p>
+                <p>When you configure a job to use a repository connection of the Windows
Share type, several additional tabs are presented.  These are, in order, "Paths", "Security",
+                       "Metadata", "Content Length", "File Mapping", and "URL Mapping".</p>
+                <p>The "Paths" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-paths.PNG" alt="Windows Share Job, Paths
tab" width="80%"/>
+                <br/><br/>
+                <p>This tab allows you to construct starting-point paths by drilling
down, and then add the constructed paths to a list, or remove existing paths from the list.
 Without any
+                       starting paths, your job includes zero documents.</p>
+                <p>Make sure your connection has a status of "Connection working" before
you open this tab, or you will see an error message, and you will not be able to build
+                       any paths.</p>
+                <p>For each included path, a list of rules is displayed which determines
what folders and documents get included with the job.  These rules
+                       will be evaluated from top to bottom, in order.  Whichever rule first
matches a given path is the one that will be used for that path.</p>
+                <p>Each rule describes the path matching criteria.  This consists of
the file specification (e.g. "*.txt"), whether the path is a file or folder name, and whether
a file is
+                       considered indexable or not by the output connection.  The rule also
describes the action to take should the rule be matched: include or exclude.</p>
+                <p>To add a rule for a starting path, select the desired values of
all the pulldowns, type in the desired file criteria, and click the "Add" button.  You may
also insert
+                       a new rule above any existing rule, by using one of the "Insert" buttons.</p>
+                <p>The "Security" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-security.PNG" alt="Windows Share Job, Security
tab" width="80%"/>
+                <br/><br/>
+                <p>The "Security" tab lets you control three things: File security,
share security, and (if security is off) the security tokens attached to all documents indexed
by the job.</p>
+                <p><b>File security</b> is the security Windows applies
to individual files.  This kind of security is supported by practically all Windows-compatible
NAS-type servers,
+                       so you may use this feature without cause for concern.</p>
+                <p><b>Share security</b> is the security Windows applies
to Windows shares.  This is an older kind of security that is no longer prevalent in most
enterprise organizations.
+                       Many modern NAS systems and Samba also do not support this security
model.  If you enable this kind of security in your job while crawling against a system that
+                       does not support it, your job will not run correctly; the first document
access will cause an error, and the job will abort.</p>
+                <p>If you turn off file security, you have the option of adding index
access tokens of your own to all documents crawled by the job.  These tokens must, of course,
be
+                       in a form appropriate for the governing authority connection.  Type
the token into the box and click the "Add" button.  It is unusual to use this feature other
+                       than for demonstrations.</p>
+                <p>The "Metadata" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-metadata.PNG" alt="Windows Share Job, Metadata
tab" width="80%"/>
+                <br/><br/>
+                <p>This tab allows you to ingest a document's path, as modified by
a set of regular expression rules, as a piece of document metadata.  Enter the metadata name
you want
+                       in the "Path attribute name" field.  Then, add the rules you want
to the list of rules.  Each rule has a match expression, which is a regular expression where
parentheses ("("
+                       and ")") mark sections you are interested in.  These sections are
called "groups" in regular expression parlance.  The replace string consists of constant text
plus
+                       substitutions of the groups from the match, perhaps modified.  For
example, "$(1)" refers to the first group within the match, while "$(1l)" refers to the first
match group
+                       mapped to lower case.  Similarly, "$(1u)" refers to the same characters,
but mapped to upper case.</p>
+                <p>For example, suppose you had a rule which had ".*/(.*)/(.*)/.*"
as a match expression, and "$(1) $(2)" as the replace string.  If presented with the path
+                       <code>Project/Folder_1/Folder_2/Filename</code>, it would
output the string <code>Folder_1 Folder_2</code>.</p>
+                <p>If more than one rule is present, the rules are all executed in
sequence.  That is, the output of the first rule is modified by the second rule, etc.</p>
+                <p>The "Content Length tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-content-length.PNG" alt="Windows Share Job,
Content Length tab" width="80%"/>
+                <br/><br/>
+                <p>This tab allows you to set a maximum content length cutoff value,
to avoid having the job try to index exceptionally large documents.  Enter the desired maximum
value.
+                       A blank value indicates an unlimited cutoff length.</p>
+                <p>The "File Mapping" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-file-mapping.PNG" alt="Windows Share Job,
File Mapping tab" width="80%"/>
+                <br/><br/>
+                <p>The mappings specified here are similar in all respects to the path
attribute mapping setup described above.  The mappings are applied to change the actual file
path
+                       discovered by the crawler into a different file path.  This can sometimes
be useful if there is some kind of conversion process between raw documents and
+                       parallel data files that contain extracted data.</p>
+                <p>The "URL Mapping" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/jcifs-job-url-mapping.PNG" alt="Windows Share Job,
URL Mapping tab" width="80%"/>
+                <br/><br/>
+                <p>The mappings specified here are similar in all respects to the path
attribute mapping setup described above.  If no mappings are present, the file path is converted
+                       to a canonical file IRI.  If mappings are present, the conversion
is presumed to produce a valid URL, which can be used to access the document via some
+                       variety of Windows Share http server.</p>
+
             </section>
 
             <section id="filenetrepository">

Added: incubator/lcf/site/src/documentation/resources/images/ad-configure-dc.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/ad-configure-dc.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/ad-configure-dc.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/ad-status.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/ad-status.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/ad-status.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-configure-server.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-configure-server.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-configure-server.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-content-length.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-content-length.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-content-length.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-file-mapping.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-file-mapping.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-file-mapping.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-metadata.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-metadata.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-metadata.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-paths.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-paths.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-paths.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-security.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-security.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-security.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-job-url-mapping.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-job-url-mapping.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-job-url-mapping.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/jcifs-status.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/jcifs-status.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/jcifs-status.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/solr-configure-arguments.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/solr-configure-arguments.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/solr-configure-arguments.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/solr-configure-server.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/solr-configure-server.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/solr-configure-server.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: incubator/lcf/site/src/documentation/resources/images/solr-status.PNG
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/resources/images/solr-status.PNG?rev=934476&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/src/documentation/resources/images/solr-status.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream



Mime
View raw message