jackrabbit-oak-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexparvule...@apache.org
Subject svn commit: r1519440 - in /jackrabbit/oak/trunk: oak-doc/src/site/markdown/differences.md oak-doc/src/site/markdown/query.md oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
Date Mon, 02 Sep 2013 13:35:28 GMT
Author: alexparvulescu
Date: Mon Sep  2 13:35:27 2013
New Revision: 1519440

URL: http://svn.apache.org/r1519440
Log:
https://issues.apache.org/jira/browse/OAK-301
 - added some query docs

Added:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md
Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md
    jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md?rev=1519440&r1=1519439&r2=1519440&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md Mon Sep  2 13:35:27 2013
@@ -71,7 +71,7 @@ Oak does not index content by default as
 necessary, much like in traditional RDBMSs. If there is no index for a specific query then
the
 repository will be traversed. That is, the query will still work but probably be very slow.
 
-See TODO for how to create a custom index.
+See the [query overview page](/query/) for how to create a custom index.
 
 Observation
 -----------

Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md?rev=1519440&view=auto
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md (added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Mon Sep  2 13:35:27 2013
@@ -0,0 +1,99 @@
+## Query
+
+Oak does not index content by default as does Jackrabbit 2. You need to create custom indexes
when
+necessary, much like in traditional RDBMSs. If there is no index for a specific query then
the
+repository will be traversed. That is, the query will still work but probably be very slow.
+
+Query Indices are defined under the `oak:index` node.
+
+### Cost calculation
+
+Each query index is expected to estimate the worst-case cost to query with the given filter.

+The returned value is between 1 (very fast; lookup of a unique node) and the estimated number
of entries to traverse, if the cursor would be fully read, and if there could in theory be
one network round-trip or disk read operation per node (this method may return a lower number
if the data is known to be fully in memory).
+
+The returned value is supposed to be an estimate and doesn't have to be very accurate. Please
note this method is called on each index whenever a query is run, so the method should be
reasonably fast (not read any data itself, or at least not read too much data).
+
+If an index implementation can not query the data, it has to return `Double.POSITIVE_INFINITY`.
+
+### Property index
+
+To define a property index on a subtree you have to add an index definition node that:
+
+* must be of type `oak:queryIndexDefinition`
+* must have the `type` property set to __`property`__
+* contains the `propertyNames` property that indicates what properties will be stored in
the index.
+
+    `propertyNames` can be a list of properties, and it is optional.in case it is missing,
the node name will be used as a property name reference value
+
+_Optionally_ you can specify
+
+* a uniqueness constraint on a property index by setting the `unique` flag to `true`
+* that the property index only applies to a certain node type by setting the `declaringNodeTypes`
property
+* the `reindex` flag which when set to `true`, triggers a full content re-index.
+
+Example:
+
+    {
+      NodeBuilder index = root.child("oak:index");
+      index.child("uuid")
+        .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME)
+        .setProperty("type", "property")
+        .setProperty("propertyNames", "jcr:uuid")
+        .setProperty("declaringNodeTypes", "mix:referenceable")
+        .setProperty("unique", true)
+        .setProperty("reindex", true);
+    }
+
+or to simplify you can use one of the existing `IndexUtils#createIndexDefinition` helper
methods:
+
+    {
+      NodeBuilder index = IndexUtils.getOrCreateOakIndex(root);
+      IndexUtils.createIndexDefinition(index, "myProp", true, false, ImmutableList.of("myProp"),
null);
+    }
+
+
+### Node type index
+
+The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on `jcr:primaryType`
`jcr:mixinTypes` to evaluate a node type restriction on the filter.
+The cost for this index is the sum of the costs of the `PropertyIndexLookup` for queries
on `jcr:primaryType` and `jcr:mixinTypes`.
+
+
+### Lucene full-text index
+
+The full-text index update is asynchronous via a background thread, see `Oak#withAsyncIndexing`.
+
+This means that some full-text searches will not work for a small window of time: the background
thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction
process. The async update status is now reflected on the `oak:index` node with the help of
a few properties, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980)
+
+TODO Node aggregation [OAK-828](https://issues.apache.org/jira/browse/OAK-828)
+
+The index definition node for a lucene-based full-text index:
+
+* must be of type `oak:queryIndexDefinition`
+* must have the `type` property set to __`lucene`__
+* must contain the `async` property set to the value `async`, this is what sends the index
update process to a background thread
+
+_Optionally_ you can add
+
+ * what subset of property types to be included in the index via the `includePropertyTypes`
property
+ * a blacklist of property names: what property to be excluded from the index via the `excludePropertyNames`
property
+ * the `reindex` flag which when set to `true`, triggers a full content re-index.
+
+Example:
+
+    {
+      NodeBuilder index = root.child("oak:index");
+      index.child("lucene")
+        .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME)
+        .setProperty("type", "lucene")
+        .setProperty("async", "async")
+        .setProperty(PropertyStates.createProperty("includePropertyTypes", ImmutableSet.of(
+            PropertyType.TYPENAME_STRING, PropertyType.TYPENAME_BINARY), Type.STRINGS))
+        .setProperty(PropertyStates.createProperty("excludePropertyNames", ImmutableSet.of(

+            "jcr:createdBy", "jcr:lastModifiedBy"), Type.STRINGS))
+        .setProperty("reindex", true);
+    }
+
+
+### Solr full-text index
+
+`TODO`

Modified: jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java?rev=1519440&r1=1519439&r2=1519440&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
(original)
+++ jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
Mon Sep  2 13:35:27 2013
@@ -100,16 +100,19 @@ import org.slf4j.LoggerFactory;
  * Under it follows the index definition node that:
  * <ul>
  * <li>must be of type <code>oak:queryIndexDefinition</code></li>
- * <li>must have the <code>type</code> property set to <b><code>lucene</code>
+ * <li>must have the <code>type</code> property set to <b><code>lucene</code></b></li>
+ * <li>must have the <code>async</code> property set to <b><code>async</code></b></li>
  * </b></li>
  * </ul>
  * </p>
- * 
  * <p>
- * Note: <code>reindex<code> is a property that when set to <code>true</code>,
- * triggers a full content reindex.
+ * Optionally you can add
+ * <ul>
+ * <li>what subset of property types to be included in the index via the <code>includePropertyTypes<code>
property</li>
+ * <li>a blacklist of property names: what property to be excluded from the index via
the <code>excludePropertyNames<code> property</li>
+ * <li>the <code>reindex<code> flag which when set to <code>true<code>,
triggers a full content re-index.</li>
+ * </ul>
  * </p>
- * 
  * <pre>
  * <code>
  * {
@@ -117,6 +120,7 @@ import org.slf4j.LoggerFactory;
  *     index.child("lucene")
  *         .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME)
  *         .setProperty("type", "lucene")
+ *         .setProperty("async", "async")
  *         .setProperty("reindex", "true");
  * }
  * </code>



Mime
View raw message