jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chet...@apache.org
Subject svn commit: r1802238 - in /jackrabbit/site/live/oak/docs/query: indexing.html lucene.html
Date Tue, 18 Jul 2017 05:15:11 GMT
Author: chetanm
Date: Tue Jul 18 05:15:10 2017
New Revision: 1802238

URL: http://svn.apache.org/viewvc?rev=1802238&view=rev
Log:
Updated to refer to new pre-extration links

Modified:
    jackrabbit/site/live/oak/docs/query/indexing.html
    jackrabbit/site/live/oak/docs/query/lucene.html

Modified: jackrabbit/site/live/oak/docs/query/indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/indexing.html?rev=1802238&r1=1802237&r2=1802238&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/indexing.html Tue Jul 18 05:15:10 2017
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-06-23 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-07-17 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170623" />
+    <meta name="Date-Revision-yyyymmdd" content="20170717" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Indexing</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -131,7 +131,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2017-06-23<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2017-07-17<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.8-SNAPSHOT</li>
         </ul>
@@ -301,7 +301,14 @@
       </ul></li>
     </ul></li>
     
-<li><a href="#reindexing">Reindexing</a></li>
+<li><a href="#reindexing">Reindexing</a>
+    
+<ul>
+      
+<li><a href="#reduce-reindexing-times">Reducing reindexing times</a></li>
+      
+<li><a href="#abort-reindex">How to Abort Reindexing</a></li>
+    </ul></li>
   </ul></li>
 </ul>
 <div class="section">
@@ -649,7 +656,10 @@ Removing corrupt flag from index [/oak:i
 </pre></div></div>
 <p>Once reindexing is complete, the <tt>reindex</tt> flag is set to <tt>false</tt>
automatically.</p>
 <div class="section">
-<h3><a name="How_to_Abort_Reindexing"></a>How to Abort Reindexing</h3>
+<h3><a name="Reducing_reindexing_times"></a><a name="reduce-reindexing-times"></a>
Reducing reindexing times</h3>
+<p>If the index being reindexed has full text extraction configured then reindexing
can take long time as most of the time is spent in text extraction. For such cases its recommended
to use text <a href="pre-extract-text.html">pre-extraction support</a>. The text
pre-extraction can be done before starting the actual reindexing. This would then ensure that
during reindexing time is not spent in performing text extraction and hence the actual time
taken for reindexing such an index gets reduced considerably.</p></div>
+<div class="section">
+<h3><a name="How_to_Abort_Reindexing"></a><a name="abort-reindex"></a>
How to Abort Reindexing</h3>
 <p>Building an index can be slow. It can be aborted (stopped before it is finished),
for example if you detect there is an error in the index definition. Reindexing and building
a new index can be aborted when using asynchronous indexes. For synchronous indexes, it can
be stopped if it was started using the <tt>PropertyIndexAsyncReindexMBean</tt>.
To do this, use the respective <tt>IndexStats</tt> JMX bean (for example, <tt>async</tt>,
<tt>fulltext-async</tt>, or <tt>async-reindex</tt>), and call the
operation <tt>abortAndPause()</tt>. Then, either set the <tt>reindex</tt>
flag to <tt>false</tt> (for an existing index), remove the index definition (for
a new index), or change the index type to <tt>disabled</tt>. Store the change.
Finally, call the operation <tt>resume()</tt> so that regular indexing operations
can continue.</p></div></div>
         </div>
       </div>

Modified: jackrabbit/site/live/oak/docs/query/lucene.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene.html?rev=1802238&r1=1802237&r2=1802238&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene.html Tue Jul 18 05:15:10 2017
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-07-03 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2017-07-17 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20170703" />
+    <meta name="Date-Revision-yyyymmdd" content="20170717" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -131,7 +131,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2017-07-03<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2017-07-17<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.8-SNAPSHOT</li>
         </ul>
@@ -568,6 +568,7 @@
   - notNullCheckEnabled (boolean) = false
   - nullCheckEnabled (boolean) = false
   - excludeFromAggregation (boolean) = false
+  - weight (long) = -1
 </pre></div></div>
 <p>Following are the details about the above mentioned config options which can be
defined at the property definition level</p>
 
@@ -655,7 +656,13 @@
 <dt>excludeFromAggregation</dt>
 <dd>Since 1.0.27, 1.2.11</dd>
 <dd>if set to true the property would be excluded from aggregation <a class="externalLink"
href="https://issues.apache.org/jira/browse/OAK-3981">OAK-3981</a></dd>
+<dt><a name="weight"></a></dt>
+<dt>weight</dt>
+<dd>Since 1.6.3</dd>
+<dd>At times, we have property definitions which are added to support for dense results
right out of  the index (e.g. <tt>contains(*, 'foo') AND [bar]='baz'</tt>). In
such cases, the added property definition &#x201c;might&#x201d;  not be the best one
to answer queries which only have the property restriction (eg only <tt>[bar]='baz'</tt>).
This  can happen when that index specifies some exclude paths and hence does not index all
<tt>bar</tt> properties.</dd>
 </dl>
+<p>For such cases set <tt>weight</tt> to <tt>0</tt> for such
properties. In such a case IndexPlanner would not use those property  definitions to determine
if that index can answer the query but it would still use them if some other index entry 
causes that index to be selected for evaluating such a query.</p>
+<p>Refer <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5899">OAK-5899</a>
for more details</p>
 <p><a name="property-names"></a><b>Property Names</b></p>
 <p>Property name can be one of following</p>
 
@@ -1178,48 +1185,7 @@ Copied 8.5 MB in 218.7 ms
 <p>From the Luke UI shown you can access various details.</p></div>
 <div class="section">
 <h3><a name="Pre-Extracting_Text_from_Binaries"></a><a name="text-extraction"></a>Pre-Extracting
Text from Binaries</h3>
-<p><tt>@since Oak 1.0.18, 1.2.3</tt></p>
-<p>Lucene indexing is performed in a single threaded mode. Extracting text from binaries
is an expensive operation and slows down the indexing rate considerably. For incremental indexing
this mostly works fine but if performing a reindex or creating the index for the first time
after migration then it increases the indexing time considerably. </p>
-<p>To speed up the Lucene indexing for such cases i.e. reindexing, we can decouple
the text extraction from actual indexing. </p>
-
-<ol style="list-style-type: decimal">
-  
-<li>Extract and store the extracted text from binaries via <a class="externalLink"
href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#tika">oak-run tool</a></li>
-  
-<li>Configure a <tt>PreExtractedTextProvider</tt> which can lookup extracted
text and  thus avoid text extraction at time of actual indexing</li>
-</ol>
-<p>Below are details around steps required for making using of this feature</p>
-
-<ol style="list-style-type: decimal">
-  
-<li>
-<p>Generate the csv file containing binary file details</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">java -cp tika-app-1.8.jar:oak-run.jar
\
-org.apache.jackrabbit.oak.run.Main tika \  
---fds-path /path/to/datastore \
---nodestore /path/to/segmentstore --data-file dump.csv generate
-</pre></div></div></li>
-  
-<li>
-<p>Extract the text </p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">java -cp tika-app-1.8.jar:oak-run.jar
\
-org.apache.jackrabbit.oak.run.Main tika \
---data-file binary-stats.csv \
---store-path ./store 
---fds-path /path/to/datastore  extract
-</pre></div></div></li>
-  
-<li>
-<p>Configure the <tt>PreExtractedTextProvider</tt> - Once the extraction
is performed configure a <tt>PreExtractedTextProvider</tt> within the application
such that Lucene indexer can make use of that to lookup extracted text. </p>
-<p>For this look for OSGi config for <tt>Apache Jackrabbit Oak DataStore PreExtractedTextProvider</tt></p>
-<p><img src="pre-extracted-text-osgi.png" alt="OSGi Configuration" /> </p></li>
-</ol>
-<p>Once <tt>PreExtractedTextProvider</tt> is configured then upon reindexing
Lucene indexer would make use of it to check if text needs to be extracted or not. Check <tt>TextExtractionStatsMBean</tt>
for various statistics around text extraction and also to validate if <tt>PreExtractedTextProvider</tt>
is being used.</p>
-<p>For more details on this feature refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2892">OAK-2892</a></p></div>
+<p>Refer to <a href="pre-extract-text.html">pre-extraction via oak-run</a>.</p></div>
 <div class="section">
 <h3><a name="Advanced_search_features"></a><a name="advanced-search-features"></a>Advanced
search features</h3>
 <div class="section">



Mime
View raw message