jena-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From codefer...@apache.org
Subject svn commit: r1834749 - /jena/site/trunk/content/documentation/query/text-query.mdtext
Date Sat, 30 Jun 2018 16:19:03 GMT
Author: codeferret
Date: Sat Jun 30 16:19:03 2018
New Revision: 1834749

URL: http://svn.apache.org/viewvc?rev=1834749&view=rev
Log:
two missed updates

Modified:
    jena/site/trunk/content/documentation/query/text-query.mdtext

Modified: jena/site/trunk/content/documentation/query/text-query.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/query/text-query.mdtext?rev=1834749&r1=1834748&r2=1834749&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/query/text-query.mdtext (original)
+++ jena/site/trunk/content/documentation/query/text-query.mdtext Sat Jun 30 16:19:03 2018
@@ -1324,11 +1324,11 @@ The first situation arises when entering
 
 There are several such languages of interest: Chinese, Tibetan, Sanskrit, Japanese and
Korean. There are various Romanizations and ideographic variants.
 
-Encodings may not normalized when inserting triples for a variety of reasons. A principle
one is that the `rdf:langString` object often must be entered in the same encoding that it
occurs in some physical text that is being catalogued. Another is that metadata may be imported
from sources that use different encoding conventions and it is desireable to preserve the
original form.
+Encodings may not be normalized when inserting triples for a variety of reasons. A principle
one is that the `rdf:langString` object often must be entered in the same encoding that it
occurs in some physical text that is being catalogued. Another is that metadata may be imported
from sources that use different encoding conventions and it is desireable to preserve the
original form.
 
 The second situation arises to provide simple support for phonetic or other forms of lossy
search at the time that triples are indexed directly in the Lucene system.
 
-To handle the first situation a `text` assembler predicate, `text:searchFor`, is introduced
that specifies a list of language tags that provides a list of language variants that should
be searched whenever a query string of a given encoding (language tag) is used. For example,
the following `text:TextIndexLucene/text:defineAnalyzers` fragment :
+To handle the first situation a `text` assembler predicate, `text:searchFor`, is introduced
that specifies a list of language tags that provides a list of language variants that should
be searched whenever a query string of a given encoding (language tag) is used. For example,
the following `text:defineAnalyzers` fragment :
 
         [ text:addLang "bo" ; 
           text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ;
@@ -1370,8 +1370,46 @@ which reflects the underlying Tibetan Un
 
 This support simplifies applications by permitting encoding independent retrieval without
additional layers of transcoding and so on. It's all done under the covers in Lucene.
 
-Solving the second situation simplifies applications by adding appropriate fields and
indexing via configuration in the `text:TextIndexLucene/text:defineAnalyzers`. For example,
the following fragment
+Solving the second situation simplifies applications by adding appropriate fields and
indexing via configuration in the `text:defineAnalyzers`. For example, the following fragment:
 
+        [ text:defineAnalyzer :hanzAnalyzer ; 
+          text:analyzer [ 
+            a text:GenericAnalyzer ;
+            text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ;
+            text:params (
+                [ text:paramName "profile" ;
+                  text:paramValue "TC2SC" ]
+                [ text:paramName "stopwords" ;
+                  text:paramValue false ]
+                [ text:paramName "filterChars" ;
+                  text:paramValue 0 ]
+                )
+            ] ; 
+          ]  
+        [ text:defineAnalyzer :han2pinyin ; 
+          text:analyzer [ 
+            a text:GenericAnalyzer ;
+            text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ;
+            text:params (
+                [ text:paramName "profile" ;
+                  text:paramValue "TC2PYstrict" ]
+                [ text:paramName "stopwords" ;
+                  text:paramValue false ]
+                [ text:paramName "filterChars" ;
+                  text:paramValue 0 ]
+                )
+            ] ; 
+          ]
+        [ text:defineAnalyzer :pinyin ; 
+          text:analyzer [ 
+            a text:GenericAnalyzer ;
+            text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ;
+            text:params (
+                [ text:paramName "profile" ;
+                  text:paramValue "PYstrict" ]
+                )
+            ] ; 
+          ]
         [ text:addLang "zh-hans" ; 
           text:searchFor ( "zh-hans" "zh-hant" ) ;
           text:auxIndex ( "zh-aux-han2pinyin" ) ;



Mime
View raw message