jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Composite Blob Store" by MattRyan
Date Tue, 15 Aug 2017 00:00:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Composite Blob Store" page has been changed by MattRyan:
https://wiki.apache.org/jackrabbit/Composite%20Blob%20Store?action=diff&rev1=4&rev2=5

Comment:
More details added for traversal algorithm

  The composite blob store is a multi-source blob store - a logical blob store consisting
of at least two delegate blob stores.  The union of all the data in all the delegate blob
stores is presented to a user of the composite blob store as a single logical "view" of the
data being stored.
  
  == Technical Details ==
+ Configuration specifies which delegate blob stores comprise the composite blob store, in
order of preference.  Configuration may also indicate "storage filters" - criteria that are
evaluated by the composite blob store to determine whether a delegate is a preferred location
for a blob.  In the case of conflicting delegates where there are multiple matches, the first
matching delegate is the one used.
- Configuration specifies which delegate blob stores comprise the composite blob store, in
order of preference.  Configuration may also indicate "storage filters" - criteria that are
evaluated by the composite blob store to determine whether a delegate is a preferred location
for a blob.  Storage filters give higher priority to a delegate, and are always applied in
a consistent order.  In the case of conflicting delegates where there are multiple matches,
the first matching delegate is the one used.
- 
- A correctly configured composite blob store must have exactly one delegate blob store that
is configured as the default blob store.  Any blobs that do not match any of the rules will
map to the default blob store.
  
  === Storage Filters ===
  Storage filters operate on any combination of the following:
@@ -19, +17 @@

   * Existence of JCR property
   * JCR property value
  
+ === Delegate Traversal ===
+ "Delegate Traversal" refers to the logic that is used to go through delegates on a read
or write request to determine which delegate should be used for a request.  The algorithm
used to traverse delegates should be extensible with a reasonable default provided.  This
is effectively an implementation of the Strategy pattern.  Any user of the system could provide
their own implementation to deliver whatever custom logic they wish to provide.
- === Delegate Search Order ===
- When accessing delegate blob stores, the composite blob store evaluates them in the following
order:
-  * Starting with any delegates with storage filters, attempt to fulfill the request using
any delegate for which the storage filters match the blob information.
-  * If no match is found, attempt to fulfill the request using any delegates without storage
filters in priority order.
-  * If no match is found, attempt to fulfill the request using the default delegate.
-  * If no match is found, attempt to fulfill the request using delegates with storage filters
for which the filters DO NOT match the blob information.
  
+ The proposed default implementation is called the Intelligent Delegate Traversal Strategy.
 It is "intelligent" because it attempts to interpret the provided configuration and apply
a priority based on the interpretation of the configuration.  Another possible default could
be e.g. the Simple Delegate Traversal Strategy.  This strategy simply attempts to use the
delegates in the raw order they are specified in the configuration, with no logic applied.
 The Intelligent Delegate Traversal Strategy will probably provide the most unsurprising results
for end users.
+ 
+ ==== Intelligent Delegate Traversal Strategy ====
+ 
+ ===== Delegate Search Order =====
+ Basically, the following order is followed.  Specifics for writes and reads are enumerated
below with some special cases.
+  * Delegates with storage filters take highest precedence.
+  * Delegates without storage filters take next precedence.
+  * Read-only delegates always take lower priority to read-write delegates for reads.
+  * Delegates for cold storage reads are always evaluated after every other option is exhausted.
+    ** Note that for now writes to cold storage aren't in scope.  This may be done via an
entirely different process.
+    ** Cold storage data stores are possible future and don't currently exist.
+ 
+ ===== Writes =====
+ ====== Storage Filters ======
+ It is possible, but not required, to configure a delegate with storage filters.  Delegates
that have storage filters always have write precedence over delegates that do not have storage
filters - meaning writes always happen to delegates with filters if the filters match what
is being written.
+ 
+ ====== Read-Only Delegates ======
+ A delegate may be specified as a read-only delegate, in which case it will not accept any
write requests.  If it would otherwise have been chosen for a write request, the request will
defer to the next delegate in the traversal that matches the request, if any.
+ 
+ ====== Delegate Write Preference ======
+ The composite blob store fulfills write requests by deferring the write to delegate blob
stores.  This is a bit more complex and requires further explanation.
+ 
+ From the user point of view, a blob store write can be either a create or an update.  The
Oak Backend interface doesn't distinguish between creates or updates.  In the case of the
composite blob store there is an important distinction.
+ 
+ When a write is requested, the composite blob store must first determine if the blob exists
in a delegate already.  Therefore all delegates have to be traversed looking for a matching
blob before a write is performed.  If a match is found, the write is treated as an update;
if no match is found, the write is treated as a create.
+ 
+ The full algorithm looks like this:
+ * Search for a matching blob through each delegate, excluding those that are cold-storage
and those that are read-only.  Start with delegates that have filters.  If any have filters
that match the blob, and if a match is found, update the blob.
+ * If no match is found, search again this time using delegates that do not have filters.
+ * If no match is found, search again this time using delegates that have filters that do
not match the object.
+   ** Configuration change can cause this situation - a blob may originally have been written
to a blob store, and afterward configuration changes such that the blob would not have been
written to that blob store originally.  See "Reading from Non-matching Delegates" below for
more.
+   ** If this case occurs, the code should also asynchronously remove the blob from the incorrect
location once it has been written to the correct location.
+ * If no match is found, treat the write as a create.  Repeat the search for a delegate,
excluding cold-storage and read-only delegates, starting with delegates that have filters.
 If any have filters that match the blob, write the blob to that delegate.
+ * If no delegates match, select the first delegate without filters, if any.
+ * If the algorithm fails to select a valid delegate, an error response should be generated
indicating the problem.
+ 
- === Reads ===
+ ===== Reads =====
- The composite blob store fulfills read requests by deferring the read to delegate blob stores,
using the delegate search order defined above.
+ The composite blob store fulfills read requests by deferring the read to delegate blob stores.
+ 
+ ====== Delegate Read Preference ======
+ The full algorithm looks like this:
+ * Search for a matching blob through each delegate, excluding those that are cold-storage.
 Start with delegates that have filters.  If any have filters that match the blob, and if
a match is found, return the blob.
+ * If no match is found, search again this time using delegates that do not have filters.
+ * If no match is found, search again this time using delegates that have filters that do
not match the object.
+   ** Configuration change can cause this situation - a blob may originally have been written
to a blob store, and afterward configuration changes such that the blob would not have been
written to that blob store originally.  See "Reading from Non-matching Delegates" below for
more.
+   ** If this case occurs, the code should also asynchronously move the blob from the incorrect
location to the correct location.
+ * If no match is found, search again this time using delegates that are cold-storage delegates
that have filters.
+ * If no match is found, search again this time using delegates that are cold-storage delegates
without filters.
+ * If no match has been found by this point, return a "not found" response.
  
  The response to a read request is the result of the first successful read from a delegate.
 In this way the top priority result is always selected.
  
- ==== Reading from Non-matching Delegates ====
+ ====== Reading from Non-matching Delegates ======
  The final read step is necessary to handle situations where blobs are temporarily located
in the "wrong" blob store - in other words, when a blob is located in a delegate where it
would not be written according to configuration.  The most obvious case where this could occur
is in the case of configuration change.  A delegate D may be configured with certain storage
filters, causing Blob B to be written there.  Then the configuration is changed such that
if B were being written now it would not have been written to D.  This final step allows B
to be found in D even though it doesn't match the storage filters.
  
- When this situation is encountered, the composite blob store should also initiate an asynchronous
background job to move the blob from it's current location to the proper one - the location
where it would be found if it were being created now.
+ When this situation is encountered, the composite blob store should also initiate an asynchronous
background job to move the blob from it's current location to the proper one - the location
where it would be found if it were being created now - on read requests, or for write requests
should write to the correct location and remove the blob asynchronously in the background
from the current location after the write is done.
  
  ==== Read-Only Delegates ====
  The composite blob store supports the notion of a read-only delegate blob store.  One or
more of the delegate blob stores can be configured in read-only mode, meaning that it can
be used to satisfy read requests but not write requests.  An example use case for this scenario
is where two content repositories are used, one for a production environment and one for a
staging environment.  The staging repository can be configured with a composite blob store
that accesses to the production storage location in read-only mode, so tests can execute in
staging using production data without modifying production data or the production store.
  
- Reads issued to a read-only delegate would be processed as normal.  Writes issued to a read-only
delegate would fail, causing the composite blob store to move on to the next delegate to attempt
to fulfill the write request.
+ Reads issued to a read-only delegate would be processed as normal.  Read-only delegates
are not considered for write requests, causing the composite blob store to move on to the
next delegate to attempt to fulfill the write request.
  
  Note that configuring all delegates of a composite blob store would make the blob store
useless for storing blobs and thus should not be an allowed condition - at least one delegate
blob store must not be a read-only delegate.
  
- The default delegate can never be a read-only delegate.
- 
- === Writes ===
- The composite blob store fulfills write requests by deferring the write to delegate blob
stores.  This is a bit more complex and requires further explanation.
- 
- From the user point of view, a blob store write can be either a create or an update.  The
Oak Backend interface doesn't distinguish between creates or updates.  In the case of the
composite blob store there is an important distinction.
- 
- When a write is requested, the composite blob store must first determine:
-  * Whether the blob exists in a delegate, AND IF SO
-  * Whether the delegate is the same one that would be selected to store the blob if it were
being created now
- 
- Locating the blob in the delegates is done using the same delegate search order as for reads.
 If the blob exists in a delegate and it is not the delegate that would be selected if the
blob were being created now, the following should happen:
-  * The new blob should be written to the correct delegate store
-  * The old blob should be deleted from the previous delegate store (asynchronously)
- 
- If the blob cannot be located, or if the blob is already located in the location where it
would be created if it were being created now, it is simply overwritten by the selected delegate
blob store.
  
  === Curation ===
  

Mime
View raw message