manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1637310 - in /manifoldcf/trunk: CHANGES.txt site/src/documentation/content/xdocs/en_US/end-user-documentation.xml site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
Date Fri, 07 Nov 2014 08:26:28 GMT
Author: kwright
Date: Fri Nov  7 08:26:27 2014
New Revision: 1637310

Fix for CONNECTORS-1096.

  (with props)

Modified: manifoldcf/trunk/CHANGES.txt
--- manifoldcf/trunk/CHANGES.txt (original)
+++ manifoldcf/trunk/CHANGES.txt Fri Nov  7 08:26:27 2014
@@ -3,6 +3,9 @@ $Id$
 ======================= 2.0-dev =====================
+CONNECTORS-1096: Document boilerplate removal options.
+(Karl Wright)
 CONNECTORS-1095: Use https for downloading everywhere.
 (Aeham Abushwashi)

Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
--- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
+++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml
Fri Nov  7 08:26:27 2014
@@ -953,6 +953,8 @@ curl -XGET http://localhost:9200/index/_
             <section id="nulltransformer">
                 <title>Null Transformer</title>
+                <p>The null transformer does nothing other than record activity through
the transformer.  It is thus useful primarily as a coding model, and a diagnostic
+                      aid.  It requires no non-standard configuration information, and provides
no tabs for a job that includes it.</p>
             <section id="tikaextractor">
@@ -964,8 +966,8 @@ curl -XGET http://localhost:9200/index/_
                 <p>As with all document transformers,  more than one Tika Content Extractor
transformation filter can be used in a single pipeline.  In the case
                       of the Tika Content Extractor, this does not seem to be of much utility.</p>
                 <p>The Tika Content Extractor transformation connection type does not
require anything other than standard configuration information.</p>
-                <p>The Tika Content Extractor transformation connection type contributes
two tabs to a job definition.  These are the "Field mapping" tab, and the "Exceptions" tab.
-                      The "Field mapping" tab looks like this:</p>
+                <p>The Tika Content Extractor transformation connection type contributes
three tabs to a job definition.  These are the "Field mapping" tab, the "Exceptions" tab,
+                      and the "Boilerplate" tab.  The "Field mapping" tab looks like this:</p>
                 <figure src="images/en_US/tika-job-field-mapping.PNG" alt="Tika Content
Extractor specification, Field Mapping tab" width="80%"/>
@@ -976,6 +978,12 @@ curl -XGET http://localhost:9200/index/_
                 <figure src="images/en_US/tika-job-exceptions.PNG" alt="Tika Content Extractor
specification, Exceptions tab" width="80%"/>
                 <p>Uncheck the checkbox to allow indexing of document metadata even
when Tika fails to extract content from the document.</p>
+                <p>The "Boilerplate" tab looks like this:</p>
+                <br/><br/>
+                <figure src="images/en_US/tika-job-boilerplate.PNG" alt="Tika Content
Extractor specification, Boilerplate tab" width="80%"/>
+                <br/><br/>
+                <p>Select the HTML boilerplate removal option you want.  These are
implementations provided by the "Boilerpipe" project; they are lightly documented,
+                      so you will need to experiment with your particular application to
find the one most appropriate for your application.</p>

Added: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
Binary file - no diff available.

Propchange: manifoldcf/trunk/site/src/documentation/resources/images/en_US/tika-job-boilerplate.PNG
    svn:mime-type = application/octet-stream

View raw message