jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Synchronous Lucene Property Indexes" by ChetanMehrotra
Date Wed, 09 Aug 2017 11:09:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Synchronous Lucene Property Indexes" page has been changed by ChetanMehrotra:
https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes?action=diff&rev1=2&rev2=3

      + uuid
        + <value 1>
          - entry = [/indexed-content-path]
-         - jcr:created = [creation time in millis]
+         - jcr:created = 1502274302 //creation time in millis
        + 49652b7e-becd-4534-b104-f867d14c7b6c
          - entry = [/jcr:system/jcr:versionStorage/63/36/f8/6336f8f5-f155-4cbc-89a4-a87e2f798260/jcr:rootVersion]
  }}}
@@ -106, +106 @@

  
  ==== Property Indexes ====
  
- TBD
+ {{{
+ /oak:index/assetType
+   + :data   //Stores the lucene index files
+   + :property-index
+     + resourceType
+       - head = 2
+       - previous = 1
+       + 1
+         - jcr:created = 1502274302 //creation time in millis
+         - lastUpdated = 1502284302 
+         + type1
+           + libs
+             + login
+                + core
+                   - match = true
+         + <value>
+           + <mirror of indexed path>
+       + 2
+         - jcr:created = 1502454302
+         + type1
+           + ...
+ }}}
+ 
+ Here we create new ''buckets'' of index values which simplifies the pruning. New buckets
would get created after each successful async
+ indexer run and older buckets would get removed. The bucket would in turn have structure
similar to content mirror store strategy
+ 
+ For each property being index keep a `head` property which refers to the current active
''bucket''. This would be changed by `IndexPruner`. In addition there would be a `previous`
bucket to refer to the last active bucket.
+ 
+ On each run of IndexPruner
+  1. Check if `IndexStatsMBean#LastIndexedTime` is changed from last known time
+  1. If changed then
+    1. Create a new bucket by incrementing the current head value 
+    1. Set `previous` to current head
+    1. Set `head` to new head value
+    1. Set `lastUpdated` on `previous` bucket to now
+  1. Remove all other buckets
+ 
+ We require both `head` and `previous` bucket as there would be some overlap between content
in `previous`
+ 
+ ==== Index Pruner ====
+ 
+ Index Pruner is a periodic task which would be responsible for pruning the index content.
It would make use of `IndexStatsMBean#LastIndexedTime` to determine upto which time async
indexer has indexed the repository and then remove entries from the property index which are
older than that time
+ 
+  * Property index - here pruning would be done by creating a new bucket and then removing
the older bucket. 
+  * Unique index - Here prunining would be done by iterating over current indexed content
and removing the older ones
  
  === Query Evaluation ===
  
- TBD
+ On the query side we would be performing a union query over the 2 index types. A union cursor
would be created which would
+ consist of 
  
+  * LucenePathCursor - Primary cursor backed by Lucene index
+  * PropertyIndexCursor - A union of path cursor from current `head` and `previous` bucket
+ 
+ ==== Open Points ====
+ 
+ If there are multiple property definition in Lucene index marked with `sync` and query involves
constraints on more than 1 then which property index should be picked
+ 

Mime
View raw message