incubator-connectors-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kwri...@apache.org
Subject svn commit: r935786 [3/3] - in /incubator/lcf/site: publish/end-user-documentation.html publish/end-user-documentation.pdf src/documentation/content/xdocs/end-user-documentation.xml
Date Tue, 20 Apr 2010 01:18:58 GMT
Modified: incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml?rev=935786&r1=935785&r2=935786&view=diff
==============================================================================
--- incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml (original)
+++ incubator/lcf/site/src/documentation/content/xdocs/end-user-documentation.xml Tue Apr
20 01:18:58 2010
@@ -8,7 +8,7 @@
         <section id="overview">
             <title>Overview</title>
             <p>This manual is intended for an end-user of Lucene Connectors Framework.
 It is assumed that the Framework has been properly installed, either by you or by a system
integrator,
-                   with all required services running and desired connectors properly registered.
 If you think you need to know how to do that yourself, please visit the "Developer Resources"
page.
+                   with all required services running and desired connection types properly
registered.  If you think you need to know how to do that yourself, please visit the "Developer
Resources" page.
             </p>
             <p>Most of this manual describes how to use the Lucene Connectors Framework
user interface.  On a standard Lucene Connectors Framework deployment, you would reach that
interface by giving your browser
                   a URL something like this: <code>http://my-server-name:8080/lcf-crawler-ui</code>.
 This will, of course, differ from system to system.  Please contact your system administrator
@@ -45,8 +45,8 @@
                 <br/><br/>
                 <figure src="images/add-new-output-connection-type.PNG" alt="Add New Output
Connection, select Type" width="80%"/>
                 <br/><br/>
-                <p>The list of output connection types in the pulldown box, and what
they are each called, is determined by your system integrator.  Each selection represents
your choice of a different
-                       output connector.  The configuration tabs for each different kind
of output connector are described in separate sections below.</p>
+                <p>The list of output connection types in the pulldown box, and what
they are each called, is determined by your system integrator.  The configuration tabs for
each different kind of output connection
+                       type are described in separate sections below.</p>
                 <p>After you choose an output connection type, click the "Continue"
button at the bottom of the pane.  You will then see all the tabs appropriate for that kind
of connection appear, and a
                        "Save" button will also appear at the bottom of the pane.  You <b>must</b>
click the "Save" button when you are done in order to create your connection.  If you click
"Cancel" instead, the new connection
                        will not be created.  (The same thing will happen if you click on
any of the navigation links in the left-hand pane.)</p>
@@ -91,8 +91,8 @@
                 <br/><br/>
                 <figure src="images/add-new-authority-connection-type.PNG" alt="Add New
Authority Connection, select Type" width="80%"/>
                 <br/><br/>
-                <p>The list of authority connection types in the pulldown box, and
what they are each called, is determined by your system integrator.  Each selection represents
your choice of a different
-                       authority connector.  The configuration tabs for each different kind
of authority connector are described in separate sections below.</p>
+                <p>The list of authority connection types in the pulldown box, and
what they are each called, is determined by your system integrator.  The configuration tabs
for each different kind of authority connection
+                       type are described in separate sections below.</p>
                 <p>After you choose an authority connection type, click the "Continue"
button at the bottom of the pane.  You will then see all the tabs appropriate for that kind
of connection appear, and a
                        "Save" button will also appear at the bottom of the pane.  You <b>must</b>
click the "Save" button when you are done in order to create your connection.  If you click
"Cancel" instead, the new connection
                        will not be created.  (The same thing will happen if you click on
any of the navigation links in the left-hand pane.)</p>
@@ -139,11 +139,11 @@
                 <br/><br/>
                 <figure src="images/add-new-repository-connection-type.PNG" alt="Add New
Repository Connection, select Type" width="80%"/>
                 <br/><br/>
-                <p>The list of repository connection types in the pulldown box, and
what they are each called, is determined by your system integrator.  Each selection represents
your choice of a different
-                       repository connector.  The configuration tabs for each different kind
of repository connector are described in separate sections below.</p>
+                <p>The list of repository connection types in the pulldown box, and
what they are each called, is determined by your system integrator.  The configuration tabs
for each different kind of repository connection
+                       type are described in separate sections below.</p>
                 <p>You may also at this point select the authority connection to secure
all documents fetched from this repository with.  Bear in mind that only some authority connection
types are compatible with any
                        given repository connection types.  Read the details of your desired
repository or authority connection type to understand its intentions, and how it is expected
to be used.</p>
-                <p>After you choose a repository connection type and an authority connection,
click the "Continue" button at the bottom of the pane.  You will then see all the tabs appropriate
for that kind of connection appear, and a
+                <p>After you choose the desired repository connection type and an authority
connection, click the "Continue" button at the bottom of the pane.  You will then see all
the tabs appropriate for that kind of connection appear, and a
                        "Save" button will also appear at the bottom of the pane.  You <b>must</b>
click the "Save" button when you are done in order to create or update your connection.  If
you click "Cancel" instead, the new connection
                        will not be created.  (The same thing will happen if you click on
any of the navigation links in the left-hand pane.)</p>
                 <p>Every repository connection has a "Throttling" tab.  The tab looks
like this:</p>
@@ -155,13 +155,13 @@
                        value is 10, which may not be optimal for all types of repository
connections.  Please refer to the section of the manual describing your authority connection
type for more precise
                        recommendations.  The second specifies how rapidly, on average, the
crawler will fetch documents via this connection.
                 </p>
-                <p>Each connection type has its own notion of "throttling bin".  A
throttling bin is the name of a resource whose access needs to be throttled.  For example,
the Web Connector uses a
+                <p>Each connection type has its own notion of "throttling bin".  A
throttling bin is the name of a resource whose access needs to be throttled.  For example,
the Web connection type uses a
                        document's server name as the throttling bin associated with the document,
since (presumably) it will be access to each individual server that will need to be throttled
independently.
                 </p>
                 <p>On the repository connection "Throttling" tab, you can specify an
unrestricted number of throttling descriptions.  Each throttling description consists of a
regular expression that describes
                        a family of throttling bins, plus a helpful description, plus an average
number of fetches per minute for each of the throttling bins that matches the regular expression.
 If a given
                        throttling bin matches more than one throttling description, the most
conservative fetch rate is chosen.</p>
-                <p>The simplest regular expression you can use is the empty regular
expression.  This will match all of the connection type's throttle bins, and thus will allow
you to specify a default
+                <p>The simplest regular expression you can use is the empty regular
expression.  This will match all of the connection's throttle bins, and thus will allow you
to specify a default
                        throttling policy for the connection.  Set the desired average fetch
rate, and click the "Add" button.  The throttling tab will then appear something like this:</p>
                 <br/><br/>
                 <figure src="images/repository-throttling-with-throttle.PNG" alt="Repository
Connection Throttling With Throttle" width="80%"/>
@@ -178,8 +178,8 @@
             <section id="jobs">
                 <title>Creating Jobs</title>
                 <p>A "job" in Lucene Connectors Framework is a description of a set
of documents.  The Framework's job is to fetch this set of documents come from a specific
repository connection, and
-                       send them to a specific output connection.  The repository connection
type that is associated with the job will determine exactly how this set of documents is described,
and to some
-                       degree how they are indexed.  The output connection type associated
with the job can also affect how each document is indexed.</p>
+                       send them to a specific output connection.  The repository connection
that is associated with the job will determine exactly how this set of documents is described,
and to some
+                       degree how they are indexed.  The output connection associated with
the job can also affect how each document is indexed.</p>
                 <p>Every job is expected to be run more than once.  Each time a job
is run, it is responsible not only for sending new or changed documents to the output connection,
but also for
                        notifying the output connection of any documents that are no longer
part of the set.  Note that there are two ways for a document to no longer be part of the
included set of documents:
                        Either the document may have been deleted from the repository, or
the document may no longer be included in the allowed set of documents.  The Framework handles
each case properly.</p>
@@ -253,7 +253,7 @@
                 <br/><br/>
                 <p>The example shows a schedule where crawls are run on Saturday and
Sunday nights at 2 AM, and run for no more than 4 hours.</p>
                 <p>The rest of the job tabs depend on the types of the connections
you selected.  Please refer to the section of the manual
-                       describing your chosen repository and output connection types for
a description of the job tabs appropriate for those connection types.</p>
+                       describing the appropriate connection types corresponding to your
chosen repository and output connections for a description of the job tabs that will appear
for those connections.</p>
             </section>
             <section id="executing">
                 <title>Executing Jobs</title>
@@ -398,10 +398,10 @@
             
             <section id="credentials">
                 <title>A Note About Credentials</title>
-                <p>If any of your selected connection type require credentials, you
may find it necessary to approach your system administrator to obtain an appropriate set.
 System administrators
+                <p>If any of your selected connection types require credentials, you
may find it necessary to approach your system administrator to obtain an appropriate set.
 System administrators
                        are often reluctant to provide accounts and credentials that have
any more power than is utterly necessary, and sometimes not even that.  Great care has been
taken in the
                        development of all connection types to be sure they require no more
privilege than is utterly necessary.  If a security-related warning appears when you view
a connection's
-                       status, you must inform the system administrator that the credentials
are inadequate to allow the connector to accomplish its task, and work with him/her to correct
the problem.
+                       status, you must inform the system administrator that the credentials
are inadequate to allow the connection to accomplish its task, and work with him/her to correct
the problem.
                 </p>
             </section>
 
@@ -414,20 +414,20 @@
                 <title>Solr Output Connection</title>
                 <p>The Solr output connection type is designed to allow Lucene Connectors
Framework to submit documents to an appropriate Solr pipeline, via the Solr
                        HTTP ingestion API.  The configuration parameters are set to the default
Solr values, which can be changed (since Solr's configuration can be changed).
-                       The Solr output connector furthermore makes no judgment as to whether
a given document is indexable or not - it accepts everything, and passes all documents
-                       on to the pipeline, where presumably the configured pipeline will
decide if a document should be rejected or not.  (All of that happens without the Solr connector
+                       The Solr output connection type furthermore makes no judgment as to
whether a given document is indexable or not - it accepts everything, and passes all documents
+                       on to the pipeline, where presumably the configured pipeline will
decide if a document should be rejected or not.  (All of that happens without a Solr connection
                        being aware of it in any way.)</p>
                 <p>Unfortunately, this lack of specificity comes at a cost.  Unless
you take care to filter documents properly in each job, large movie files or other opaque
                        content may well be picked up and sent to Solr for indexing, which
will greatly increase the dead load on the overall system.  It is therefore a good idea to
review
-                       all crawls that involve the Solr connector while they are underway,
to be sure there isn't a misconfiguration of this kind.</p>
-                <p>When you create a Solr output connection, two configuration tabs
appear.  The "Server" tab allows you to configure the HTTP target of the connector:</p>
+                       all crawls done through a Solr connection while they are underway,
to be sure there isn't a misconfiguration of this kind.</p>
+                <p>When you create a Solr output connection, two configuration tabs
appear.  The "Server" tab allows you to configure the HTTP target of the connection:</p>
                 <br/><br/>
                 <figure src="images/solr-configure-server.PNG" alt="Solr Configuration,
Server tab" width="80%"/>
                 <br/><br/>
-                <p>Fill in the fields according to your Solr configuration.  The Solr
connector supports only basic authentication at this time; if you have this enabled, supply
the credentials
+                <p>Fill in the fields according to your Solr configuration.  The Solr
connection type supports only basic authentication at this time; if you have this enabled,
supply the credentials
                        as requested on the bottom part of the form.</p>
                 <p>The second tab is the "Arguments" tab, which allows you to specify
arbitrary arguments to be sent to Solr.  This is a popular way of telling Solr how to handle
-                       specific documents, so the connector allows you to add arguments to
each Solr indexing request:</p>
+                       specific documents, so the connection type allows you to add arguments
to each Solr indexing request:</p>
                 <br/><br/>
                 <figure src="images/solr-configure-arguments.PNG" alt="Solr Configuration,
Arguments tab" width="80%"/>
                 <br/><br/>
@@ -448,8 +448,8 @@
                 <title>MetaCarta GTS Output Connection</title>
                 <p>The MetaCarta GTS output connection type is designed to allow Lucene
Connectors Framework to submit documents to an appropriate MetaCarta GTS search
                        appliance, via the appliance's HTTP Ingestion API.</p>
-                <p>The connector implicitly understands that GTS can only handle text,
HTML, XML, RTF, PDF, and Microsoft Office documents.  All other document types will be
-                       considered to be unindexable.  This helps prevent jobs based on a
GTS-type output connector from fetching data that is large, but of no particular relevance.</p>
+                <p>The connection type implicitly understands that GTS can only handle
text, HTML, XML, RTF, PDF, and Microsoft Office documents.  All other document types will
be
+                       considered to be unindexable.  This helps prevent jobs based on a
GTS-type output connection from fetching data that is large, but of no particular relevance.</p>
                 <p>When you configure a job to use a GTS-type output connection, two
additional tabs will be presented to the user: "Collections" and "Document Templates".  These
                        tabs allow per-job specification of these GTS-specific features.</p>
                 <p>More here later</p>
@@ -457,7 +457,7 @@
             
             <section id="nulloutputconnector">
                 <title>Null Output Connection</title>
-                <p>The null output connection type is meant primarily to function as
an aid for people writing repository connectors.  It is not expected to be useful in practice.</p>
+                <p>The null output connection type is meant primarily to function as
an aid for people writing repository connection types.  It is not expected to be useful in
practice.</p>
                 <p>The null output connection type simply logs indexing and deletion
requests, and does nothing else.  It does not have any special configuration tabs, nor does
it
                        contribute tabs to jobs defined that use it.</p>
             </section>
@@ -470,8 +470,8 @@
             <section id="adauthority">
                 <title>Active Directory Authority Connection</title>
                 <p>An active directory authority connection is essential for enforcing
security for documents from Microsoft SharePoint, Autonomy Meridio, and IBM FileNet repositories.
-                       The connector needs to be provided with information about how to log
into an appropriate Windows domain controller, with a user that has sufficient privileges
to
-                       be able to look up any user's ID and group relationships.  While the
connector has some known limitations, it should function well for most straightforward Windows
+                       This connection type needs to be provided with information about how
to log into an appropriate Windows domain controller, with a user that has sufficient privileges
to
+                       be able to look up any user's ID and group relationships.  While the
connection type has some known limitations, it should function well for most straightforward
Windows
                        security architecture situations.  The cases in which it may not be
adequate include:</p>
                 <br/>
                 <ul>
@@ -479,7 +479,7 @@
                     <li>when the expected number of requests per second is fairly high</li>
                 </ul>
                 <br/>
-                <p>The active directory authority connection type provides a single
additional tab to the authority connection editing screen: the "Domain Controller" tab:</p>
+                <p>An active directory authority connection type has a single special
tab in the authority connection editing screen: the "Domain Controller" tab:</p>
                 <br/><br/>
                 <figure src="images/ad-configure-dc.PNG" alt="AD Configuration, Domain
Controller tab" width="80%"/>
                 <br/><br/>
@@ -528,7 +528,7 @@
                 <p>Jobs created using a file-system-type repository connection
                        have two tabs in addition to the standard repertoire: the "Hop Filters"
tab, and the "Paths" tab.</p>
                 <p>The "Hop Filters" tab allows you to restrict the document set by
the number of child hops from the path root.  While this is not terribly interesting in the
case of a file
-                       system, the same basic functionality is also used in the web connector,
where it is a more important feature.  The file system connection type gives you a way to
see
+                       system, the same basic functionality is also used in the Web connection
type, where it is a more important feature.  The file system connection type gives you a way
to see
                        how this feature works, in a more predictable environment:</p>
                 <br/><br/>
                 <figure src="images/filesystem-job-hopcount.PNG" alt="File System Connection,
Hop Filters tab" width="80%"/>
@@ -556,7 +556,7 @@
 
             <section id="rssrepository">
                 <title>Generic RSS Repository Connection</title>
-                <p>The RSS connection type is specifically designed to crawl RSS feeds.
 While the web connection type can also extract links from RSS feeds, the RSS connection type
+                <p>The RSS connection type is specifically designed to crawl RSS feeds.
 While the Web connection type can also extract links from RSS feeds, the RSS connection type
                        differs in the following ways:</p>
                 <br/>
                 <ul>
@@ -568,13 +568,13 @@
                 <br/>
                 <p>Many users of the RSS connection type set up their jobs to run continuously,
configuring their jobs to never refetch documents, but rather to expire them after some 30
days.
                        This model works reasonably well for news, which is what RSS is often
used for.</p>
-                <p>A connection of the RSS connection type has the following special
tabs: "Email", "Robots", "Bandwidth", and "Proxy".  The "Email" tab looks like this:</p>
+                <p>An RSS connection has the following special tabs: "Email", "Robots",
"Bandwidth", and "Proxy".  The "Email" tab looks like this:</p>
                 <br/><br/>
                 <figure src="images/rss-configure-email.PNG" alt="RSS Connection, Email
tab" width="80%"/>
                 <br/><br/>
                 <p>Enter an email address.  This email address will be included in
all requests made by the RSS connection, so that webmasters can report any difficulties that
their
                        sites experience as the result of improper throttling, etc.</p>
-                <p>This field is mandatory.  While the RSS connection type makes no
effort to validate the correctness of the email
+                <p>This field is mandatory.  While an RSS connection makes no effort
to validate the correctness of the email
                        field, you will probably want to remain a good web citizen and provide
a valid email address.  Remember that it is very easy for a webmaster to block access to
                        a crawler that does not seem to be behaving in a polite manner.</p>
                 <p>The "Robots" tab looks like this:</p>
@@ -602,7 +602,7 @@
                 </ul>
                 <br/>
                 <p>Because of the above, we suggest that you configure your RSS connection
using <b>both</b> the "Bandwidth" <b>and</b> the "Throttling" tabs.
 Select maximum
-                       values on the "Bandwidth" tab, and corresponding average values estimates
on the "Throttling" tab.  Remember that a document identifier with the RSS connection type
is the
+                       values on the "Bandwidth" tab, and corresponding average values estimates
on the "Throttling" tab.  Remember that a document identifier for an RSS connection is the
                        document's URL, and the bin name for that URL is the server name.
 Also, please note that the "Maximum number of connections per JVM" field's default value
of 10 is
                        unlikely to be correct for connections of the RSS type; you should
have at least one available connection per worker thread, for best performance.  Since the
                        default number of worker threads is 30, you should set this parameter
to at least a value of 30 for normal operation.</p>
@@ -662,7 +662,7 @@
                     <tr><td>Bad feed refetch time</td><td>How long
to wait before trying to refetch a feed that contains parsing errors (in minutes, empty is
infinity)</td></tr>
                 </table>
                 <p>The "Security" tab allows you to assign access tokens to the documents
indexed with this job.  In order to use it, you must first decide what authority connection
to use
-                       to secure these documents, and what the access tokens from that authority
connection type look like.  The tab itself looks like this:</p>
+                       to secure these documents, and what the access tokens from that authority
connection look like.  The tab itself looks like this:</p>
                 <br/><br/>
                 <figure src="images/rss-job-security.PNG" alt="RSS job, Security tab"
width="80%"/>
                 <br/><br/>
@@ -718,14 +718,14 @@
                        reason, we strongly encourage you to consider using the RSS connection
type for all applications where it might reasonably apply.</p>
                 <p>Many users of the Web connection type set up their jobs to run continuously,
configuring their jobs to occasionally refetch documents, or to not refetch documents
                        ever, and expire them after some period of time.</p>
-                <p>A connection of the Web connection type has the following special
tabs: "Email", "Robots", "Bandwidth", "Access Credentials", and "Certificates".  The "Email"
tab
+                <p>A Web connection has the following special tabs: "Email", "Robots",
"Bandwidth", "Access Credentials", and "Certificates".  The "Email" tab
                        looks like this:</p>
                 <br/><br/>
                 <figure src="images/web-configure-email.PNG" alt="Web Connection, Email
tab" width="80%"/>
                 <br/><br/>
                 <p>Enter an email address.  This email address will be included in
all requests made by the Web connection, so that webmasters can report any difficulties that
their
                        sites experience as the result of improper throttling, etc.</p>
-                <p>This field is mandatory.  While the Web connection type makes no
effort to validate the correctness of the email
+                <p>This field is mandatory.  While a Web connection makes no effort
to validate the correctness of the email
                        field, you will probably want to remain a good web citizen and provide
a valid email address.  Remember that it is very easy for a webmaster to block access to
                        a crawler that does not seem to be behaving in a polite manner.</p>
                 <p>The "Robots" tab looks like this:</p>
@@ -752,11 +752,11 @@
                 </ul>
                 <br/>
                 <p>Because of the above, we suggest that you configure your Web connection
using <b>both</b> the "Bandwidth" <b>and</b> the "Throttling" tabs.
 Select maximum
-                       values on the "Bandwidth" tab, and corresponding average values estimates
on the "Throttling" tab.  Remember that a document identifier with the Web connection type
is the
+                       values on the "Bandwidth" tab, and corresponding average values estimates
on the "Throttling" tab.  Remember that a document identifier for a Web connection is the
                        document's URL, and the bin name for that URL is the server name.
 Also, please note that the "Maximum number of connections per JVM" field's default value
of 10 is
                        unlikely to be correct for connections of the Web type; you should
have at least one available connection per worker thread, for best performance.  Since the
                        default number of worker threads is 30, you should set this parameter
to at least a value of 30 for normal operation.</p>
-                <p>The Web connection type's "Access Credentials" tab describes how
pages get authenticated.  There is support on this tab for both page-based authentication
(e.g.
+                <p>The Web connection's "Access Credentials" tab describes how pages
get authenticated.  There is support on this tab for both page-based authentication (e.g.
                        basic auth or all forms of NTLM), as well as session-based authentication
(which involves the fetch of many pages to establish a logged-in session).  The initial
                        appearance of the "Access Credentials" tab shows both kinds of authentication:</p>
                 <br/><br/>
@@ -775,8 +775,8 @@
                     <li>How to fill in the appropriate forms within the login sequence
with appropriate login information</li>
                 </ul>
                 <br/>
-                <p>The Web connection type labels pages that are part of the login
sequence "login pages", and pages that are protected site content "content pages".  The Web
-                       connection type will not attempt to index login pages.  They are special
pages that have but one purpose: establishing an authenticated session.</p>
+                <p>A Web connection labels pages that are part of the login sequence
"login pages", and pages that are protected site content "content pages".  A Web
+                       connection will not attempt to index login pages.  They are special
pages that have but one purpose: establishing an authenticated session.</p>
                 <p>If all this is not complicated enough, your research also has to
cover two very different cases: when you are first entering the site anew, and second when
you try to fetch
                        a content page and you are no longer logged in, because your session
has expired.  In both cases, the session authentication rule must be able to properly log
in and
                        fetch content, because you cannot control when a page will be fetched
or refetched by the Framework.</p>
@@ -785,7 +785,7 @@
                        want either the redirection, or the login screen, to be considered
content pages.  The correct way to handle such a setup would be to declare one kind of login
page to consist
                        of a redirection to the login screen URL, and another kind of login
page to consist of the login screen URL with the appropriate form.  Furthermore, you would
want to supply
                        the correct login data for the form, and allow the form to be submitted,
and so the login form's target may also need to be declared as a login page.</p>
-                <p>The kinds of content that the Web connection type can recognize
as a login page are the following:</p>
+                <p>The kinds of content that a Web connection can recognize as a login
page are the following:</p>
                 <br/>
                 <ul>
                     <li>A redirection to a specific URL, as described by a regular
expression</li>
@@ -809,12 +809,12 @@
                 <p>Form data that is not specified will be posted with the default
value determined by the HTML of the page.  The Web connection type is unable, at this time,
to execute
                        Javascript, and therefore you may need to fill out some form values
that are filled in by Javascript in order to get the form to post in a useful way.  If you
have a form
                        that relies heavily on Javascript to post properly, you may need considerable
effort and web programming skills to figure out how to get these forms to post properly
-                       with the Web Connector.  Luckily, such obfuscated login screens are
still rare.</p>
+                       with a Web connection.  Luckily, such obfuscated login screens are
still rare.</p>
                 <p>A series of login pages form a "login page sequence" for the site.
 For each login page, the Web connection decides what page to fetch next by what you specified
for
                        the login page criteria.  So, for a redirection to a specific URL,
the next page to be fetched will be that redirected URL.  For a form, the next page fetched
will be the
                        action page indicated by the specified form.  For a link to a target,
the next page fetched will be the target URL.  When the login page sequence ends, the next
page
                        fetched after that will be the original content page that the Web
connection was trying to fetch when the login sequence started.</p>
-                <p>Debugging session authentication problems is best done by looking
at a Simple History report for your Web connection.  The Web connection type records several
+                <p>Debugging session authentication problems is best done by looking
at a Simple History report for your Web connection.  A Web connection records several
                        types of events which, between them, can give a very strong picture
of what is happening.  These event types are as follows:</p>
                 <br/>
                 <table>
@@ -848,18 +848,18 @@
                 <p>The Windows Share connection type allows you to access content stored
on Windows shares, even from non-Windows systems.  Also supported are Samba and various
                        third-party Network Attached Storage servers.</p>
                 <p>DFS nodes and referrals are fully supported, provided the referral
machine names can be looked up properly via DNS on the server where the Framework is
-                       running.  For each document, the Windows Share connection type generates
identifiers that can be either "file:" IRI's, or mapped "http:" URI's, depending on how it
is
-                       configured.  This allows for a great deal of flexibility in deployment
environments, but also may require some work to properly set up.</p>
-                <p>In particular, if you intend to use file IRI's as your identifiers,
you should check with your system integrator to be sure these are being handled properly by
the search component of your
+                       running.  For each document, a Windows Share connection creates an
index identifier that can be either a "file:" IRI's, or a mapped "http:" URI's, depending
on how it is
+                       configured.  This allows for a great deal of flexibility in deployment
environments, but also may require some work to properly set up.
+                       In particular, if you intend to use file IRI's as your identifiers,
you should check with your system integrator to be sure these are being handled properly by
the search component of your
                        system.  When you use a browser such as Internet Explorer to view
a document from a Windows file system called <code>\\servername\sharename\dir1\filename.txt</code>,
                        the browser converts that to an IRI that looks something like this:
<code>file://///servername/sharename/dir1/filename.txt</code>.
                        While this seems simple, major complexities arise when the underlying
file name has special characters in it, such as spaces, "#" symbols, or worse still, non-ASCII
                        characters.  Unfortunately, every version of Internet Explorer handles
these situations somewhat differently, so there is not any fully correct way for the Windows
-                       Share connection type to convert file names to IRI's.  Instead, the
connector always uses a standard canonical form, and expects the search results display system
component to know how to properly form
+                       Share connection type to convert file names to IRI's.  Instead, the
connection always uses a standard canonical form, and expects the search results display system
component to know how to properly form
                        the right IRI for the browser or client being used.</p>
-                <p>If you are interested in enforcing security for documents crawled
with a Windows Share repository connection type, you will need to first configure an authority
connection
+                <p>If you are interested in enforcing security for documents crawled
with a Windows Share repository connection, you will need to first configure an authority
connection
                        of the Active Directory type to control access to these documents.</p>
-                <p>The Windows Share connection type provides a single additional tab
to the repository connection editing screen: the "Server" tab:</p>
+                <p>A Windows Share connection has a single special tab on the repository
connection editing screen: the "Server" tab:</p>
                 <br/><br/>
                 <figure src="images/jcifs-configure-server.PNG" alt="Windows Share Connection,
Server tab" width="80%"/>
                 <br/><br/>
@@ -953,12 +953,12 @@
                     <li>Sybase (via the JTDS JDBC driver)</li>
                 </ul>
                 <br/>
-                <p>This connection type <b>cannot</b> be configured to
work with other databases as well without software changes.  Depending on your particular
installation,
-                       some of these options may not be available.</p>
+                <p>This connection type <b>cannot</b> be configured to
work with other databases without software changes.  Depending on your particular installation,
+                       some of the above options may not be available.</p>
                 <p>The generic database connection type currently has no per-document
notion of security.  It is possible to set document security for all documents specified by
a
                        given job.  Since this form of security requires you to know what
the actual access tokens are, you must have detailed knowledge of the authority connection
you
                        intend to use, and what sorts of access tokens it produces.</p>
-                <p>The generic database connection type provides three additional tabs
to the repository connection editing screen: the "Database Type" tab, the "Server" tab, and
the
+                <p>A generic database connection has three special tabs on the repository
connection editing screen: the "Database Type" tab, the "Server" tab, and the
                        "Credentials" tab.  The "Database Type" tab looks like this:</p>
                 <br/><br/>
                 <figure src="images/jdbc-configure-database-type.PNG" alt="Generic Database
Connection, Database Type tab" width="80%"/>
@@ -1006,9 +1006,9 @@
                 <p>If you want your database connection to function in an incremental
manner, you must also come up with the format of a "version string".  This string is used
by the 
                        Framework to determine if a document has changed.  It must change
whenever anything that might affect the document's indexing changes.  (It is not a problem
if
                        it changes for other reasons, as long as it fulfills that principle
criteria.)</p>
-                <p>The queries you provide get substituted before they are used by
the connector.  The example queries, which are present when the queries tab is first opened
for a
+                <p>The queries you provide get substituted before they are used by
the connection.  The example queries, which are present when the queries tab is first opened
for a
                        new job, show many of these substitutions in roughly the manner in
which they are intended to be used.  For example, "$(IDCOLUMN)" will substitute a column
-                       name expected by the connector to contain the document identifier
into the query.  The list of substitution strings are as follows:</p>
+                       name expected by the connection to contain the document identifier
into the query.  The list of substitution strings are as follows:</p>
                 <br/>
                 <table>
                     <tr><td><b>String name</b></td><td><b>Meaning/use</b></td></tr>



Mime
View raw message