manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kwri...@apache.org
Subject svn commit: r1462937 - in /manifoldcf/trunk: ./ site/src/documentation/content/xdocs/en_US/ site/src/documentation/resources/images/en_US/
Date Sun, 31 Mar 2013 12:45:29 GMT
Author: kwright
Date: Sun Mar 31 12:45:28 2013
New Revision: 1462937

URL: http://svn.apache.org/r1462937
Log:
Fix for CONNECTORS-670.  Update end-user documentation for RSS connector.

Added:
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-exclusions.PNG
  (with props)
Modified:
    manifoldcf/trunk/CHANGES.txt
    manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-bandwidth.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-email.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-proxy.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-robots.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-canonicalization.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-dechromed-content.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-mappings.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-metadata.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-security.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-time-values.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-urls.PNG
    manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-status.PNG

Modified: manifoldcf/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/CHANGES.txt?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
--- manifoldcf/trunk/CHANGES.txt (original)
+++ manifoldcf/trunk/CHANGES.txt Sun Mar 31 12:45:28 2013
@@ -3,6 +3,9 @@ $Id$
 
 ======================= 1.2-dev =====================
 
+CONNECTORS-670: Update end-user documentation for RSS connector.
+(Karl Wright)
+
 CONNECTORS-642: Add Elastic Search plugin.
 (Simon Willnauer, Piergiorgio Lucidi, Karl Wright)
 

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
(original)
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
Sun Mar 31 12:45:28 2013
@@ -912,25 +912,25 @@
                 <br/><br/>
                 <figure src="images/en_US/rss-configure-bandwidth.PNG" alt="RSS Connection,
Bandwidth tab" width="80%"/>
                 <br/><br/>
-                <p>This tab allows you to control the <b>maximum</b> rate
at which the connection fetches data, on a per-server basis, as well as the <b>maximum</b>
fetches per minute,
-                       also per-server.  Finally, the maximum number of socket connections
made per server at any one time is also controllable by this tab.</p>
-                <p>The screen shot displays parameters that are
-                       considered reasonably polite.  The default values for this table are
all blank, meaning that, by default, there is no throttling whatsoever!  Please do not make
the mistake
-                       of crawling other people's sites without adequate politeness parameters
in place.</p>
-                <p>The "Throttle group" parameter allows you to treat multiple RSS-type
connections together, for the purposes of throttling.  All RSS-type connections that have
the same
-                       throttle group name will use the same pool for throttling purposes.</p>
+                <p>This tab allows you to control the <b>maximum</b> rate
at which the connection fetches data, on a per-server basis, as well as the <b>maximum</b>
fetches
+                       per minute, also per-server.  Finally, the maximum number of socket
connections made per server at any one time is also controllable by this tab.</p>
+                <p>The screen shot displays parameters that are considered reasonably
polite.  The default values for this table are all blank, meaning that, by default, there
is no
+                       throttling whatsoever!  Please do not make the mistake of crawling
other people's sites without adequate politeness parameters in place.</p>
+                <p>The "Throttle group" parameter allows you to treat multiple RSS-type
connections together, for the purposes of throttling.  All RSS-type connections that
+                       have the same throttle group name will use the same pool for throttling
purposes.</p>
                 <p>The "Bandwidth" tab is related to the throttles that you can set
on the "Throttling" tab in the following ways:</p>
                 <br/>
                 <ul>
                     <li>The "Bandwidth" tab sets the <b>maximum</b> values,
while the "Throttling" tab sets the <b>average</b> values.</li>
-                    <li>The "Bandwidth" tab does not affect how documents are scheduled
in the queue; it simply blocks documents until it is safe to go ahead, which will use up a
crawler thread
-                           for the entire period that both the wait and the fetch take place.
 The "Throttling" tab affects how often documents are scheduled, so it does not waste threads.</li>
+                    <li>The "Bandwidth" tab does not affect how documents are scheduled
in the queue; it simply blocks documents until it is safe to go ahead, which will use up a
+                          crawler thread for the entire period that both the wait and the
fetch take place.  The "Throttling" tab affects how often documents are scheduled, so it does
+                          not waste threads.</li>
                 </ul>
                 <br/>
                 <p>Because of the above, we suggest that you configure your RSS connection
using <b>both</b> the "Bandwidth" <b>and</b> the "Throttling" tabs.
 Select maximum
                        values on the "Bandwidth" tab, and corresponding average values estimates
on the "Throttling" tab.  Remember that a document identifier for an RSS connection is the
-                       document's URL, and the bin name for that URL is the server name.
 Also, please note that the "Maximum number of connections per JVM" field's default value
of 10 is
-                       unlikely to be correct for connections of the RSS type; you should
have at least one available connection per worker thread, for best performance.  Since the
+                       document's URL, and the bin name for that URL is the server name.
 Also, please note that the "Maximum number of connections per JVM" field's default value
+                       of 10 is unlikely to be correct for connections of the RSS type; you
should have at least one available connection per worker thread, for best performance.  Since
the
                        default number of worker threads is 30, you should set this parameter
to at least a value of 30 for normal operation.</p>
                 <p>The "Proxy" tab allows you to specify a proxy that you want to crawl
through.  The RSS connection type supports proxies that are secured with all forms of the
NTLM
                        authentication method.  This is quite typical of large organizations.
 The tab looks like this:</p>
@@ -944,8 +944,8 @@
                 <figure src="images/en_US/rss-status.PNG" alt="RSS Status" width="80%"/>
                 <br/><br/>
                 <p></p>
-                <p>Jobs created using connections of the RSS type have the following
additional tabs: "URLs", "Canonicalization", "Mappings", "Time Values", "Security", "Metadata",
and
-                       "Dechromed Content".  The URLs tab is where you describe the feeds
that are part of the job.  It looks like this:</p>
+                <p>Jobs created using connections of the RSS type have the following
additional tabs: "URLs", "Canonicalization", "URL mappings", "Exclusions", "Time Values",
+                       "Security", "Metadata", and "Dechromed Content".  The URLs tab is
where you describe the feeds that are part of the job.  It looks like this:</p>
                 <br/><br/>
                 <figure src="images/en_US/rss-job-urls.PNG" alt="RSS job, URLs tab" width="80%"/>
                 <br/><br/>
@@ -976,6 +976,12 @@
                        <code>http://Server/Folder_1/Filename</code>, it would
output the string <code>http://Folder_1/Filename</code>.</p>
                 <p>If more than one rule is present, the rules are all executed in
sequence.  That is, the output of the first rule is modified by the second rule, etc.</p>
                 <p>To add a rule, fill in the match expression and output string, and
click the "Add" button.</p>
+                <p>The "Exclusions" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/rss-job-exclusions.PNG" alt="RSS job, Exclusions
tab" width="80%"/>
+                <br/><br/>
+                <p>Here you can enter a set of regular expressions, one per line, which
describe which document URLs to exclude from the job.  This can be very helpful if you
+                     are crawling RSS feeds that include a variety of content where you only
want to index a subset of the content.</p>
                 <p>The "Time Values" tab looks like this:</p>
                 <br/><br/>
                 <figure src="images/en_US/rss-job-time-values.PNG" alt="RSS job, Time
Values tab" width="80%"/>

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-bandwidth.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-bandwidth.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-email.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-email.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-proxy.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-proxy.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-robots.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-configure-robots.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-canonicalization.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-canonicalization.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-dechromed-content.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-dechromed-content.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-exclusions.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-exclusions.PNG?rev=1462937&view=auto
==============================================================================
Binary file - no diff available.

Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-exclusions.PNG
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-mappings.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-mappings.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-metadata.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-metadata.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-security.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-security.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-time-values.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-time-values.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-urls.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-job-urls.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.

Modified: manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-status.PNG
URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/resources/images/en_US/rss-status.PNG?rev=1462937&r1=1462936&r2=1462937&view=diff
==============================================================================
Binary files - no diff available.



Mime
View raw message