jena-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject svn commit: r1581907 - /jena/site/trunk/content/documentation/query/text-query.mdtext
Date Wed, 26 Mar 2014 16:19:41 GMT
Author: andy
Date: Wed Mar 26 16:19:40 2014
New Revision: 1581907

URL: http://svn.apache.org/r1581907
Log:
JENA-654 Documentation patch

Modified:
    jena/site/trunk/content/documentation/query/text-query.mdtext

Modified: jena/site/trunk/content/documentation/query/text-query.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/query/text-query.mdtext?rev=1581907&r1=1581906&r2=1581907&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/query/text-query.mdtext (original)
+++ jena/site/trunk/content/documentation/query/text-query.mdtext Wed Mar 26 16:19:40 2014
@@ -37,6 +37,7 @@ the actual label.  More details are give
 -   [Query with SPARQL](#query-with-sparql)
 -   [Configuration](#configuration)
     -   [Text Dataset Assembler](#text-dataset-assembler)
+    -   [Configuring an analyzer (#text-configuring-an-analyzer)
     -   [Configuration by Code](#configuration-by-code)
     -   [Graph-specific Indexing](#graph-specific-indexing)
 - [Working with Fuseki](#working-with-fuseki)
@@ -237,6 +238,36 @@ the text dataset, one for the base data.
 needs to identify the text dataset by it's URI
 `http://localhost/jena_example/#text_dataset`.
 
+### Configuring an Analyzer
+
+Text to be indexed is passed through a text analyzer that divides it into tokens 
+and may perform other transformations such as eliminating stop words.  If a Lucene
+text index is used then, by default a `StandardAnalyzer` is used.  If a Solr text
+index is used, the analyzer used is determined by the Solr configuration.
+
+It is possible to configure an alternative analyzer for each field indexed in a
+Lucene index.  For example:
+
+    <#entMap> a text:EntityMap ;
+        text:entityField      "uri" ;
+        text:defaultField     "text" ;
+        text:map (
+             [ text:field "text" ; 
+               text:predicate rdfs:label ;
+               text:analyzer [
+                   a text:StandardAnalyzer ;
+                   text:stopWords ("a" "an" "and" "but")
+               ]
+             ]
+             ) .
+             
+will configure the index to analyze values of the 'text' field
+using a `StandardAnalyzer` with the given list of stop words.
+
+Other analyzer types that may be specified are `SimpleAnalyzer` and `KeywordAnalyzer`, 
+neither of which has any configuration parameters.  See the Lucene documentation
+for details of what these analyzers do.
+
 ### Configuration by Code
 
 A text dataset can also be constructed in code as might be done for a



Mime
View raw message