jena-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r1031934 - in /websites/staging/jena/trunk/content: ./ documentation/query/text-query.html
Date Sat, 30 Jun 2018 16:19:58 GMT
Author: buildbot
Date: Sat Jun 30 16:19:58 2018
New Revision: 1031934

Log:
Staging update by buildbot for jena

Modified:
    websites/staging/jena/trunk/content/   (props changed)
    websites/staging/jena/trunk/content/documentation/query/text-query.html

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Jun 30 16:19:58 2018
@@ -1 +1 @@
-1834748
+1834749

Modified: websites/staging/jena/trunk/content/documentation/query/text-query.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/query/text-query.html (original)
+++ websites/staging/jena/trunk/content/documentation/query/text-query.html Sat Jun 30 16:19:58
2018
@@ -1531,9 +1531,9 @@ indexing and search.</p>
 </ul>
 <p>The first situation arises when entering triples that include languages with multiple
encodings that for various reasons are not normalized to a single encoding. In this situation
it is helpful to be able to retrieve appropriate result sets without regard for the encodings
used at the time that the triples were inserted into the dataset.</p>
 <p>There are several such languages of interest: Chinese, Tibetan, Sanskrit, Japanese
and Korean. There are various Romanizations and ideographic variants.</p>
-<p>Encodings may not normalized when inserting triples for a variety of reasons. A
principle one is that the <code>rdf:langString</code> object often must be entered
in the same encoding that it occurs in some physical text that is being catalogued. Another
is that metadata may be imported from sources that use different encoding conventions and
it is desireable to preserve the original form.</p>
+<p>Encodings may not be normalized when inserting triples for a variety of reasons.
A principle one is that the <code>rdf:langString</code> object often must be entered
in the same encoding that it occurs in some physical text that is being catalogued. Another
is that metadata may be imported from sources that use different encoding conventions and
it is desireable to preserve the original form.</p>
 <p>The second situation arises to provide simple support for phonetic or other forms
of lossy search at the time that triples are indexed directly in the Lucene system.</p>
-<p>To handle the first situation a <code>text</code> assembler predicate,
<code>text:searchFor</code>, is introduced that specifies a list of language tags
that provides a list of language variants that should be searched whenever a query string
of a given encoding (language tag) is used. For example, the following <code>text:TextIndexLucene/text:defineAnalyzers</code>
fragment :</p>
+<p>To handle the first situation a <code>text</code> assembler predicate,
<code>text:searchFor</code>, is introduced that specifies a list of language tags
that provides a list of language variants that should be searched whenever a query string
of a given encoding (language tag) is used. For example, the following <code>text:defineAnalyzers</code>
fragment :</p>
 <div class="codehilite"><pre>    <span class="p">[</span> <span
class="n">text</span><span class="p">:</span><span class="n">addLang</span>
&quot;<span class="n">bo</span>&quot; <span class="p">;</span>

       <span class="n">text</span><span class="p">:</span><span
class="n">searchFor</span> <span class="p">(</span> &quot;<span
class="n">bo</span>&quot; &quot;<span class="n">bo</span><span
class="o">-</span><span class="n">x</span><span class="o">-</span><span
class="n">ewts</span>&quot; &quot;<span class="n">bo</span><span
class="o">-</span><span class="n">alalc97</span>&quot; <span class="p">)</span>
<span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span
class="n">analyzer</span> <span class="p">[</span> 
@@ -1575,8 +1575,46 @@ indexing and search.</p>
 
 <p>which reflects the underlying Tibetan Unicode term encoding. During <code>IndexSearcher.search</code>
all documents with one of the three fields in the index for term, "རྗེ", will
be returned even though the value in the fields <code>label_bo-x-ewts</code> and
<code>label_bo-alalc97</code> for the returned documents will be the original
value "rje".</p>
 <p>This support simplifies applications by permitting encoding independent retrieval
without additional layers of transcoding and so on. It's all done under the covers in Lucene.</p>
-<p>Solving the second situation simplifies applications by adding appropriate fields
and indexing via configuration in the <code>text:TextIndexLucene/text:defineAnalyzers</code>.
For example, the following fragment</p>
-<div class="codehilite"><pre>    <span class="p">[</span> <span
class="n">text</span><span class="p">:</span><span class="n">addLang</span>
&quot;<span class="n">zh</span><span class="o">-</span><span
class="n">hans</span>&quot; <span class="p">;</span> 
+<p>Solving the second situation simplifies applications by adding appropriate fields
and indexing via configuration in the <code>text:defineAnalyzers</code>. For
example, the following fragment:</p>
+<div class="codehilite"><pre>    <span class="p">[</span> <span
class="n">text</span><span class="p">:</span><span class="n">defineAnalyzer</span>
<span class="p">:</span><span class="n">hanzAnalyzer</span> <span
class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span
class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span
class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span
class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span
class="p">.</span><span class="n">zh</span><span class="p">.</span><span
class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> &quot;<span class="n">TC2SC</span>&quot;
<span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">stopwords</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> <span class="n">false</span> <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">filterChars</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> 0 <span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>  
+    <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">defineAnalyzer</span> <span class="p">:</span><span
class="n">han2pinyin</span> <span class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span
class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span
class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span
class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span
class="p">.</span><span class="n">zh</span><span class="p">.</span><span
class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> &quot;<span class="n">TC2PYstrict</span>&quot;
<span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">stopwords</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> <span class="n">false</span> <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">filterChars</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> 0 <span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>
+    <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">defineAnalyzer</span> <span class="p">:</span><span
class="n">pinyin</span> <span class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span
class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span
class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span
class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span
class="p">.</span><span class="n">zh</span><span class="p">.</span><span
class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span
class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">paramName</span> &quot;<span
class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span
class="n">paramValue</span> &quot;<span class="n">PYstrict</span>&quot;
<span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>
+    <span class="p">[</span> <span class="n">text</span><span
class="p">:</span><span class="n">addLang</span> &quot;<span class="n">zh</span><span
class="o">-</span><span class="n">hans</span>&quot; <span class="p">;</span>

       <span class="n">text</span><span class="p">:</span><span
class="n">searchFor</span> <span class="p">(</span> &quot;<span
class="n">zh</span><span class="o">-</span><span class="n">hans</span>&quot;
&quot;<span class="n">zh</span><span class="o">-</span><span
class="n">hant</span>&quot; <span class="p">)</span> <span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span
class="n">auxIndex</span> <span class="p">(</span> &quot;<span
class="n">zh</span><span class="o">-</span><span class="n">aux</span><span
class="o">-</span><span class="n">han2pinyin</span>&quot; <span
class="p">)</span> <span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span
class="n">analyzer</span> <span class="p">[</span>



Mime
View raw message