jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chet...@apache.org
Subject svn commit: r1802900 - /jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
Date Tue, 25 Jul 2017 08:35:38 GMT
Author: chetanm
Date: Tue Jul 25 08:35:38 2017
New Revision: 1802900

URL: http://svn.apache.org/viewvc?rev=1802900&view=rev
Log:
OAK-6471 - Support adding or updating index definitions via oak-run

Modified:
    jackrabbit/site/live/oak/docs/query/oak-run-indexing.html

Modified: jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/oak-run-indexing.html?rev=1802900&r1=1802899&r2=1802900&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/oak-run-indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/oak-run-indexing.html Tue Jul 25 08:35:38 2017
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-07-18 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-07-25 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170718" />
+    <meta name="Date-Revision-yyyymmdd" content="20170725" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; <a name="oak-run-indexing"></a>
Oak Run Indexing</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -131,7 +131,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2017-07-18<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2017-07-25<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.8-SNAPSHOT</li>
         </ul>
@@ -282,6 +282,10 @@
 <li><a href="#online-indexing-perform-reindex">Step 2 - Perform reindexing</a></li>
       </ul></li>
       
+<li><a href="#index-definition-updates">Updating or Adding New Index Definitions</a></li>
+      
+<li><a href="#json-file-format">JSON File Format</a></li>
+      
 <li><a href="#tika-setup">Tika Setup</a></li>
     </ul></li>
   </ul></li>
@@ -372,24 +376,30 @@
 <p>If the index being reindexed involves fulltext index and the repository has binary
content then its recommended that first <a href="pre-extract-text.html">text pre-extraction</a>
is performed. This ensures that costly operation around text extraction is done prior to actual
indexing so that actual indexing does not do text extraction in critical path</p></div>
 <div class="section">
 <h4><a name="Step_2_-_Create_Checkpoint"></a><a name="out-of-band-create-checkpoint"></a>Step
2 - Create Checkpoint</h4>
-<p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with lifetime
of 1 month. &#xab;TBD&#xbb;</p></div>
+<p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with a long enough
lifetime like 10 days. For this invoke  <tt>CheckpointMBean#createCheckpoint</tt>
with 864000000 as argument for lifetime</p></div>
 <div class="section">
 <h4><a name="Step_3_-_Perform_Reindex"></a><a name="out-of-band-perform-reindex"></a>
Step 3 - Perform Reindex</h4>
 <p>In this step we perform the actual indexing via oak-run where it connects to repository
in read only mode. </p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint"> java -jar oak-run*.jar index --fds-path=/path/to/datastore
 /path/to/segmentstore/ --reindex --index-paths=/oak:index/indexName
+<div class="source"><pre class="prettyprint"> java -jar oak-run*.jar index --reindex
\
+ --index-paths=/oak:index/indexName \
+ --checkpoint=0fd2a388-de87-47d3-8f30-e86b1cf0a081 \    
+ --fds-path=/path/to/datastore  /path/to/segmentstore/ 
 </pre></div></div>
 <p>Here following options can be used</p>
 
 <ul>
   
-<li><tt>--pre-extracted-text-dir</tt> - Directory path containing pre extracted
text generated via step #1</li>
+<li><tt>--pre-extracted-text-dir</tt> - Directory path containing pre extracted
text generated via step #1 (optional)</li>
+  
+<li><tt>--index-paths</tt> - This command requires an explicit set of index
paths which need to be indexed (required)</li>
   
-<li><tt>--index-paths</tt> - This command requires an explicit set of index
paths which need to be indexed</li>
+<li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated,
when indexing in read only mode. For  testing purpose, it can be set to &#x2018;head&#x2019;
to indicate that the head state should be used. (required)</li>
   
-<li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated,
when indexing in read only mode. For  testing purpose, it can be set to &#x2018;head&#x2019;
to indicate that the head state should be used.</li>
-</ul></div>
+<li><tt>-index-definitions-file</tt> - json file file path which contains
updated index definitions</li>
+</ul>
+<p>If the index does not support fulltext indexing then you can omit providing BlobStore
details</p></div>
 <div class="section">
 <h4><a name="Step_4_-_Import_the_index"></a><a name="out-of-band-import-reindex"></a>Step
4 - Import the index</h4>
 <p>As a last step we need to import the index back in the repository. This can be done
in one of the following ways</p>
@@ -425,6 +435,90 @@
 <div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --reindex
--index-paths=/oak:index/lucene --read-write --fds-path=/path/to/datastore /path/to/segmentstore
 </pre></div></div></div></div>
 <div class="section">
+<h3><a name="Updating_or_Adding_New_Index_Definitions"></a><a name="index-definition-updates"></a>
Updating or Adding New Index Definitions</h3>
+<p><tt>@since Oak 1.7.5</tt></p>
+<p>Index tooling support updating and adding new index definitions to existing setups.
This can be done by passing in path of a json file which contains index definitions</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">java -jar oak-run*.jar index index
--reindex --index-paths=/oak:index/newAssetIndex \
+--index-definitions-file=index-definitions.json \
+--fds-path=/path/to/datastore /path/to/segmentstore  
+</pre></div></div>
+<p>Where index-definitions.json has following structure</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">{
+  &quot;/oak:index/newAssetIndex&quot;: {
+    &quot;evaluatePathRestrictions&quot;: true,
+    &quot;compatVersion&quot;: 2,
+    &quot;type&quot;: &quot;lucene&quot;,
+    &quot;async&quot;: &quot;async&quot;,
+    &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
+    &quot;indexRules&quot;: {
+      &quot;jcr:primaryType&quot;: &quot;nt:unstructured&quot;,
+      &quot;dam:Asset&quot;: {
+        &quot;jcr:primaryType&quot;: &quot;nt:unstructured&quot;,
+        &quot;properties&quot;: {
+          &quot;jcr:primaryType&quot;: &quot;nt:unstructured&quot;,
+          &quot;valid&quot;: {
+            &quot;name&quot;: &quot;valid&quot;,
+            &quot;propertyIndex&quot;: true,
+            &quot;jcr:primaryType&quot;: &quot;nt:unstructured&quot;,
+            &quot;notNullCheckEnabled&quot;: true
+          },
+          &quot;mimetype&quot;: {
+            &quot;name&quot;: &quot;mimetype&quot;,
+            &quot;analyzed&quot;: true,
+            &quot;jcr:primaryType&quot;: &quot;nt:unstructured&quot;
+          }
+        }
+      }
+    }
+  }
+}
+</pre></div></div>
+<p>Some points to note about this json file * Each key of top level object refers to
the index path * The value of each such key refers to complete index definition * If the index
path is not present in existing repository then it would result in a new index being created
* In case of new index it must be ensured that parent path structure must already exist in
repository.  So if a new index is being created at <tt>/content/en/oak:index/contentIndex</tt>
then path upto <tt>/content/en/oak:index</tt>  should already exist in repository</p>
+<p>You can also use the json file generated from <a class="externalLink" href="http://oakutils.appspot.com/generate/index">Oakutils</a>.
It needs to be modified to confirm to above structure i.e. enclose the whole definition under
the intended index path key.</p>
+<p>In general the index definitions does not need any special encoding of values as
Index definitions in Oak use only String, Long and Double types mostly. However if the index
refers to binary config like Tika config then the binary data would need to encoded. Refer
to next section for more details.</p>
+<p>This option is supported in both online and out-of-band indexing.</p>
+<p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-6471">OAK-6471</a></p></div>
+<div class="section">
+<h3><a name="JSON_File_Format"></a><a name="json-file-format"></a>
JSON File Format</h3>
+<p>Some of the standard types used in Oak are not supported directly by JSON like names,
blobs etc. Those would need to be encoded in a specific format.</p>
+<p>Below are the encoding rules</p>
+
+<dl>
+<dt>LONG</dt>
+<dd>No encoding required</dd>
+<dd><i>&#x201c;compatVersion&#x201d;: 2</i></dd>
+<dt>BOOLEAN</dt>
+<dd>No encoding required</dd>
+<dd><i>&#x201c;propertyIndex&#x201d;: true,</i></dd>
+<dt>DOUBLE</dt>
+<dd>No encoding required</dd>
+<dd><i>&#x201c;weight&#x201d;: 1.5</i></dd>
+<dt>STRING</dt>
+<dd>Prefix the value with <tt>str:</tt></dd>
+<dd>Generally the value need not be encoded. Encoding is only required if the string
starts with 3 letters and then colon</dd>
+<dd><i>&#x201c;pathPropertyName&#x201d;: &#x201c;str:jcr:path&#x201d;</i></dd>
+<dt>DATE</dt>
+<dd>Prefix the value with <tt>dat:</tt>. The value is ISO8601 formatted
date string</dd>
+<dd><i>&#x201c;created&#x201d;: &#x201c;dat:2017-07-20T13:23:21.196+05:30&#x201d;</i></dd>
+<dt>NAME</dt>
+<dd>Prefix the value with <tt>nam:</tt>.</dd>
+<dd>For <tt>jcr:primaryType</tt> and <tt>jcr:mixins</tt> no
encoding is required. Any property with these names would be converted to  NAME type</dd>
+<dd><i>&#x201c;nodetype&#x201d;: &#x201c;nam:nt:base&#x201d;</i></dd>
+<dt>PATH</dt>
+<dd>Prefix the value with <tt>pat:</tt></dd>
+<dd><i>&#x201c;imagePath&#x201d;: &#x201c;pat:/content/assets/book.jpg&#x201d;</i></dd>
+<dt>URI</dt>
+<dd>Prefix the value with <tt>uri:</tt></dd>
+<dd><i>&#x201c;serverURI&#x201d;: &#x201c;uri:http://foo.example.com&#x201d;</i></dd>
+<dt>BINARY</dt>
+<dd>By default the binary values are encoded as Base64 string if the binary is less
than 1 MB size. The encoded value is  prefixed with <tt>:blobId:</tt></dd>
+<dd><i>&#x201c;jcr:data&#x201d;: &#x201c;:blobId:axygz&#x201d;</i></dd>
+</dl></div>
+<div class="section">
 <h3><a name="Tika_Setup"></a><a name="tika-setup"></a> Tika
Setup</h3>
 <p>If the indexes being reindex have fulltext indexing enabled then you need to include
Tika library in classpath. This is required even if pre extraction is used so as to ensure
that any new binary added after pre-extraction is done can be indexed.</p>
 <p>First download the <a class="externalLink" href="https://tika.apache.org/download.html">tika-app</a>
jar from Tika downloads. You should be able to use 1.15 version with Oak 1.7.4 jar.</p>



Mime
View raw message