jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Synchronous Lucene Property Indexes" by ChetanMehrotra
Date Wed, 09 Aug 2017 08:50:04 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Synchronous Lucene Property Indexes" page has been changed by ChetanMehrotra:
https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes?action=diff&rev1=1&rev2=2

  Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near real time (NRT)
support for Lucene based indexes. It also had a limited support for sync indexes. This feature
aims to improve that to next level and enable support for sync property indexes.
  
+ == Synchronous Index Usecases ==
+ 
+ Synchronous indexes are required in following usecases
+ 
+ <<Anchor(unique-indexes)>>
+ === Unique Indexes ===
+ 
+ For unique indexes like uuid index, principal name index it needs to be ensured that indexed
value is unique across whole of the repository at time of commit itself. If the indexed value
already exists e.g. principal with same name already exist then that commit should fail. To
meet this requirement we need synchronous index which get updated as part of commit itself.
+ 
+ <<Anchor(property-indexes)>>
+ === Property Indexes ===
+ 
+ Depending on application requirements the query results may be 
+ 
+  1. Eventually Consistent - Any changes done get ''eventually'' reflected in query results.

+  1. Consistent - Any change done gets ''immediately'' reflected in query result
+ 
+ For most cases like user driven search eventual consistent search result work fine and hence
async indexes can be used. With recent support for NRT indexes (OAK-4412) the user experience
get better and changes done by user get reflected ''very soon'' in search result.
+ 
+ However for some cases we need to support fully consistent search results. For e.g. assume
there is component which maintains a cache for nodes of type `app:Component` and uses a observation
listener to listen for changes in nodes of type `app:Component` and upon finding any changes
it rebuilds the cache by queriying for all such nodes. For this cache to be correct it needs
to be ensured query results are consistent wrt session state associated with the listener.
Otherwise it may miss on picking a new component and later request to cache for such component
would fail.
+ 
+ For such usecases its required that indexes are synchronous and results provided by index
are consistent
+ 
+ == Drawbacks of current property indexes ==
+ 
+ Oak currently has support for synchronous property indexes which are used to meet above
usecases. However the current implementation has certain drawbacks
+ 
+   1. Perform poorly over remote storage - The property indexes are stores as normal NodeState
and hence reading them over remote storage like Mongo performs poorly
+   1. Prone to conflicts - The content mirror store strategy is prone to conflict if the
index content is volatile
+   1. Storage overhead - The storage over head is large specially for remote storage as each
NodeState is mapped to 1 Document. 
+ 
+ ----
+ 
+ == Proposal ==
+ 
+ To overcome the drawbacks and still meet the synchronous index requirements we can implement
a hybrid index where the indexes content is stored using both property index (for recent enrties)
and lucene indexes (for older entries). At high level flow would be
+ 
+  1. Store recently added index content as normal property index
+  1. As part of async indexer run index the same content as part of lucene index
+  1. Later prune the property index content which would have been indexed as part of lucene
index
+  1. Any query would result in union of query results from both property index and lucene
indexes (with some caveats)
+ 
+ === Index Definition ===
+ 
+ The synchronous index support would need to be enabled via index definition
+ 
+  * `async` - This needs to have an entry `sync`
+  * Set `sync` to true for each property definition which needs to be indexed in a sync way
+ 
+ {{{
+ /oak:index/assetType
+   - jcr:primaryType = "oak:QueryIndexDefinition"
+   - type = "lucene"
+   - async = ["async", "sync"]
+   + indexRules
+     + nt:base
+       + properties
+         + resourceType
+           - propertyIndex = true
+           - name = "assetType"
+           - sync = true
+ }}}
+ 
+ For unique indexes set `unique` i.e. true
+ 
+ {{{
+ /oak:index/uuid
+   - jcr:primaryType = "oak:QueryIndexDefinition"
+   - type = "lucene"
+   - async = ["async", "sync"]
+   + indexRules
+     + nt:base
+       + properties
+         + uuid
+           - propertyIndex = true
+           - name = "jcr:uuid"
+           - unique = true
+ }}}
+ 
+ === Index Storage ===
+ 
+ The property index content would be stored as hidden nodes under the index definition nodes.
The storage structure would be similar to existing format for property index with some changes
+ 
+ ==== Unique Indexes ====
+ 
+ {{{
+ /oak:index/assetType
+   + :data   //Stores the lucene index files
+   + :property-index
+     + uuid
+       + <value 1>
+         - entry = [/indexed-content-path]
+         - jcr:created = [creation time in millis]
+       + 49652b7e-becd-4534-b104-f867d14c7b6c
+         - entry = [/jcr:system/jcr:versionStorage/63/36/f8/6336f8f5-f155-4cbc-89a4-a87e2f798260/jcr:rootVersion]
+ }}}
+ 
+ Here
+  * `:property-index` - hidden node under which property indexes would be stored for various
properties which are marked as sync
+  * For unique index entry each entry would also have a time stamp which would later used
for pruning
+ 
+ 
+ ==== Property Indexes ====
+ 
+ TBD
+ 
+ === Query Evaluation ===
+ 
+ TBD
+ 

Mime
View raw message