Return-Path: X-Original-To: apmail-jackrabbit-oak-commits-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EF4411017B for ; Mon, 2 Sep 2013 13:35:55 +0000 (UTC) Received: (qmail 5028 invoked by uid 500); 2 Sep 2013 13:35:55 -0000 Delivered-To: apmail-jackrabbit-oak-commits-archive@jackrabbit.apache.org Received: (qmail 4977 invoked by uid 500); 2 Sep 2013 13:35:52 -0000 Mailing-List: contact oak-commits-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-commits@jackrabbit.apache.org Received: (qmail 4962 invoked by uid 99); 2 Sep 2013 13:35:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 13:35:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 13:35:48 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 6D8C623889ED; Mon, 2 Sep 2013 13:35:28 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1519440 - in /jackrabbit/oak/trunk: oak-doc/src/site/markdown/differences.md oak-doc/src/site/markdown/query.md oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Date: Mon, 02 Sep 2013 13:35:28 -0000 To: oak-commits@jackrabbit.apache.org From: alexparvulescu@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20130902133528.6D8C623889ED@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: alexparvulescu Date: Mon Sep 2 13:35:27 2013 New Revision: 1519440 URL: http://svn.apache.org/r1519440 Log: https://issues.apache.org/jira/browse/OAK-301 - added some query docs Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md?rev=1519440&r1=1519439&r2=1519440&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md (original) +++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md Mon Sep 2 13:35:27 2013 @@ -71,7 +71,7 @@ Oak does not index content by default as necessary, much like in traditional RDBMSs. If there is no index for a specific query then the repository will be traversed. That is, the query will still work but probably be very slow. -See TODO for how to create a custom index. +See the [query overview page](/query/) for how to create a custom index. Observation ----------- Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md?rev=1519440&view=auto ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md (added) +++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Mon Sep 2 13:35:27 2013 @@ -0,0 +1,99 @@ +## Query + +Oak does not index content by default as does Jackrabbit 2. You need to create custom indexes when +necessary, much like in traditional RDBMSs. If there is no index for a specific query then the +repository will be traversed. That is, the query will still work but probably be very slow. + +Query Indices are defined under the `oak:index` node. + +### Cost calculation + +Each query index is expected to estimate the worst-case cost to query with the given filter. +The returned value is between 1 (very fast; lookup of a unique node) and the estimated number of entries to traverse, if the cursor would be fully read, and if there could in theory be one network round-trip or disk read operation per node (this method may return a lower number if the data is known to be fully in memory). + +The returned value is supposed to be an estimate and doesn't have to be very accurate. Please note this method is called on each index whenever a query is run, so the method should be reasonably fast (not read any data itself, or at least not read too much data). + +If an index implementation can not query the data, it has to return `Double.POSITIVE_INFINITY`. + +### Property index + +To define a property index on a subtree you have to add an index definition node that: + +* must be of type `oak:queryIndexDefinition` +* must have the `type` property set to __`property`__ +* contains the `propertyNames` property that indicates what properties will be stored in the index. + + `propertyNames` can be a list of properties, and it is optional.in case it is missing, the node name will be used as a property name reference value + +_Optionally_ you can specify + +* a uniqueness constraint on a property index by setting the `unique` flag to `true` +* that the property index only applies to a certain node type by setting the `declaringNodeTypes` property +* the `reindex` flag which when set to `true`, triggers a full content re-index. + +Example: + + { + NodeBuilder index = root.child("oak:index"); + index.child("uuid") + .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME) + .setProperty("type", "property") + .setProperty("propertyNames", "jcr:uuid") + .setProperty("declaringNodeTypes", "mix:referenceable") + .setProperty("unique", true) + .setProperty("reindex", true); + } + +or to simplify you can use one of the existing `IndexUtils#createIndexDefinition` helper methods: + + { + NodeBuilder index = IndexUtils.getOrCreateOakIndex(root); + IndexUtils.createIndexDefinition(index, "myProp", true, false, ImmutableList.of("myProp"), null); + } + + +### Node type index + +The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on `jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the filter. +The cost for this index is the sum of the costs of the `PropertyIndexLookup` for queries on `jcr:primaryType` and `jcr:mixinTypes`. + + +### Lucene full-text index + +The full-text index update is asynchronous via a background thread, see `Oak#withAsyncIndexing`. + +This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process. The async update status is now reflected on the `oak:index` node with the help of a few properties, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980) + +TODO Node aggregation [OAK-828](https://issues.apache.org/jira/browse/OAK-828) + +The index definition node for a lucene-based full-text index: + +* must be of type `oak:queryIndexDefinition` +* must have the `type` property set to __`lucene`__ +* must contain the `async` property set to the value `async`, this is what sends the index update process to a background thread + +_Optionally_ you can add + + * what subset of property types to be included in the index via the `includePropertyTypes` property + * a blacklist of property names: what property to be excluded from the index via the `excludePropertyNames` property + * the `reindex` flag which when set to `true`, triggers a full content re-index. + +Example: + + { + NodeBuilder index = root.child("oak:index"); + index.child("lucene") + .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME) + .setProperty("type", "lucene") + .setProperty("async", "async") + .setProperty(PropertyStates.createProperty("includePropertyTypes", ImmutableSet.of( + PropertyType.TYPENAME_STRING, PropertyType.TYPENAME_BINARY), Type.STRINGS)) + .setProperty(PropertyStates.createProperty("excludePropertyNames", ImmutableSet.of( + "jcr:createdBy", "jcr:lastModifiedBy"), Type.STRINGS)) + .setProperty("reindex", true); + } + + +### Solr full-text index + +`TODO` Modified: jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java?rev=1519440&r1=1519439&r2=1519440&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java (original) +++ jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Mon Sep 2 13:35:27 2013 @@ -100,16 +100,19 @@ import org.slf4j.LoggerFactory; * Under it follows the index definition node that: *
    *
  • must be of type oak:queryIndexDefinition
  • - *
  • must have the type property set to lucene + *
  • must have the type property set to lucene
  • + *
  • must have the async property set to async
  • *
    *
*

- * *

- * Note: reindex is a property that when set to true, - * triggers a full content reindex. + * Optionally you can add + *

    + *
  • what subset of property types to be included in the index via the includePropertyTypes property
  • + *
  • a blacklist of property names: what property to be excluded from the index via the excludePropertyNames property
  • + *
  • the reindex flag which when set to true, triggers a full content re-index.
  • + *
*

- * *
  * 
  * {
@@ -117,6 +120,7 @@ import org.slf4j.LoggerFactory;
  *     index.child("lucene")
  *         .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME)
  *         .setProperty("type", "lucene")
+ *         .setProperty("async", "async")
  *         .setProperty("reindex", "true");
  * }
  *