jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Composite Blob Store" by MattRyan
Date Mon, 31 Jul 2017 20:30:32 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Composite Blob Store" page has been changed by MattRyan:
https://wiki.apache.org/jackrabbit/Composite%20Blob%20Store?action=diff&rev1=2&rev2=3

Comment:
Merge overlay blob store concepts into a single concept for composite blob store.

  NOTE:  The current status of this component is a '''proposed feature'''.
  
  == Overview ==
- The composite blob store is a multi-source blob store - a logical blob store consisting
of at least two delegate blob stores.  In the case of the composite blob store, every item
stored has exactly one correct delegate where it can be stored.  Configuration specifies which
delegate blob stores comprise the composite blob store.  Configuration also indicates the
criteria that are evaluated by the composite blob store to determine the proper location for
a blob.  These rules are always applied in a consistent order.  In the case of conflicting
rules where there are multiple matches, the first matching rule is the one applied.
+ The composite blob store is a multi-source blob store - a logical blob store consisting
of at least two delegate blob stores.  The union of all the data in all the delegate blob
stores is presented to a user of the composite blob store as a single logical "view" of the
data being stored.
  
- Thus by applying the rules consistently every time, the composite blob store always knows
the correct delegate to choose for reads or writes.  Since there is always exactly one correct
location for a blob, it should not be the case that the same blob is located in more than
one delegate blob store.
+ == Technical Details ==
+ Configuration specifies which delegate blob stores comprise the composite blob store, in
order of preference.  Configuration may also indicate "storage hints" - criteria that are
evaluated by the composite blob store to determine whether a delegate is a preferred location
for a blob.  Storage hints give higher priority to a delegate, and are always applied in a
consistent order.  In the case of conflicting delegates where there are multiple matches,
the first matching delegate is the one used.
  
  A correctly configured composite blob store must have exactly one delegate blob store that
is configured as the default blob store.  Any blobs that do not match any of the rules will
map to the default blob store.
  
+ === Storage Hints ===
- Rules operate on any combination of the following:
+ Storage hints operate on any combination of the following:
   * JCR path
   * JCR node type
   * Existence of JCR property
   * JCR property value
  
- == Comparison to Overlay Blob Store ==
- The composite data store is similar in concept to [[Overlay Blob Store]].  Like the overlay
blob store, it presents a single, unified logical view that combines all of the data represented
by the delegate blob stores.  The composite is different in that every delegate contains distinct
data.  In other words, for any two delegate blob stores ''A'' and ''B'' that are both delegates
of the same composite blob store, the intersection of ''A'' and ''B'' must be the empty set
{}, whereas for an overlay data store such an intersection may or may not be the empty set
{}.
+ === Delegate Search Order ===
+ When accessing delegate blob stores, the composite blob store evaluates them in the following
order:
+  * Starting with any delegates with storage hints, attempt to fulfill the request using
any delegate for which the storage hints match the blob information.
+  * If no match is found, attempt to fulfill the request using any delegates without storage
hints in priority order.
+  * If no match is found, attempt to fulfill the request using the default delegate.
+  * If no match is found, attempt to fulfill the request using delegates with storage hints
for which the hints DO NOT match the blob information.
  
  === Reads ===
- The composite blob store fulfills read requests by evaluating the blob and its associated
information (JCR path, JCR node type, JCR properties, JCR property values) with the configuration
rules, and thereby determining the correct delegate blob store for the blob.  The read is
issued to the delegate and the result of the read is returned as the result of the composite
blob store read and no subsequent reads are attempted for that request.
+ The composite blob store fulfills read requests by deferring the read to delegate blob stores,
using the delegate search order defined above.
+ 
+ The response to a read request is the result of the first successful read from a delegate.
 In this way the top priority result is always selected.
+ 
+ ==== Reading from Non-matching Delegates ====
+ The final read step is necessary to handle situations where blobs are temporarily located
in the "wrong" blob store - in other words, when a blob is located in a delegate where it
would not be written according to configuration.  The most obvious case where this could occur
is in the case of configuration change.  A delegate D may be configured with certain storage
hints, causing Blob B to be written there.  Then the configuration is changed such that if
B were being written now it would not have been written to D.  This final step allows B to
be found in D even though it doesn't match the storage hints.
+ 
+ When this situation is encountered, the composite blob store should also initiate an asynchronous
background job to move the blob from it's current location to the proper one - the location
where it would be found if it were being created now.
+ 
+ ==== Read-Only Delegates ====
+ The composite blob store supports the notion of a read-only delegate blob store.  One or
more of the delegate blob stores can be configured in read-only mode, meaning that it can
be used to satisfy read requests but not write requests.  An example use case for this scenario
is where two content repositories are used, one for a production environment and one for a
staging environment.  The staging repository can be configured with a composite blob store
that accesses to the production storage location in read-only mode, so tests can execute in
staging using production data without modifying production data or the production store.
+ 
+ Reads issued to a read-only delegate would be processed as normal.  Writes issued to a read-only
delegate would fail, causing the composite blob store to move on to the next delegate to attempt
to fulfill the write request.
+ 
+ Note that configuring all delegates of a composite blob store would make the blob store
useless for storing blobs and thus should not be an allowed condition - at least one delegate
blob store must not be a read-only delegate.
+ 
+ The default delegate can never be a read-only delegate.
  
  === Writes ===
- The composite blob store fulfills write requests by evaluating the blob the same way as
is done for reads.  And the same as is done for reads, writes are issued to the selected delegate
and the result of the write is returned as the result of the composite blob store write and
no subsequent writes are attempted for that request.
+ The composite blob store fulfills write requests by deferring the write to delegate blob
stores.  This is a bit more complex and requires further explanation.
+ 
+ From the user point of view, a blob store write can be either a create or an update.  The
Oak Backend interface doesn't distinguish between creates or updates.  In the case of the
composite blob store there is an important distinction.
+ 
+ When a write is requested, the composite blob store must first determine:
+  * Whether the blob exists in a delegate, AND IF SO
+  * Whether the delegate is the same one that would be selected to store the blob if it were
being created now
+ 
+ Locating the blob in the delegates is done using the same delegate search order as for reads.
 If the blob exists in a delegate and it is not the delegate that would be selected if the
blob were being created now, the following should happen:
+  * The new blob should be written to the correct delegate store
+  * The old blob should be deleted from the previous delegate store (asynchronously)
+ 
+ If the blob cannot be located, or if the blob is already located in the location where it
would be created if it were being created now, it is simply overwritten by the selected delegate
blob store.
  
  === Curation ===
  
  Curation is the process of evaluating the blobs in a blob store to determine if that blob
store is still the correct location for blobs to reside. In the case of the composite blob
store, a reason to curate data may be to gradually move data for which the determination of
"correct" location can change over time.  For example, a change to a JCR property may cause
a blob to belong in a different delegate blob store than where it was originally stored. 
Or a configuration change may mean that blobs need to move to different locations.
  
- Curation is not in the scope of the composite blob store; however, it may be prudent to
add common curators to the same package in Oak in future efforts.
+ The composite blob store should do on-demand curation, meaning that it will initiate a background
move of an object whenever one is requested that is not in the correct location.  However,
doing ongoing or background processing of the entirety of all blob stores to evaluate whether
each is in the correct location is not in the scope of the composite blob store.  However,
it may be prudent to add common curators to the same package in Oak in future efforts.
  
  == Use Cases ==
  === Replication Across Storage Regions ===
@@ -53, +87 @@

      +-------------+  +-------------+  +-------------+
  }}}
  
+ === Hierarchical Blob Store ===
+ The composite blob store directly addresses [[JCR Binary Usecase]] UC14 to store data in
one of a number of blob stores based on a hierarchy.
+ 
+ In the example below, blobs are initially stored in the !FileDataStore and then once they
are more than 30 days old are moved to !S3DataStore.  They can be read from either location.
 Note that moving from one data store to the other fits under the category of curation, which
is not in this scope.
+ 
+ {{{
+ +-------+
+ |       |  <30 Days Old  +---------------+
+ |       +----------------> FileDataStore |
+ |       |                +---------------+
+ |  Oak  |
+ |       |
+ |       |  >=30 Days Old  +-------------+
+ |       +-----------------> S3DataStore |
+ |       |                 +-------------+
+ +-------+
+ }}}
+ 
+ === Staging Environment ===
+ The composite blob store can be used to address a production/staging deployment use case,
where one Oak repository is the production repository and another is the staging repository.
 The production repository accesses a single blob store.  The staging repository uses a composite
blob store to access a staging blob store as well as the production blob store in read-only
mode.  Thus staging can serve blobs out of either blob store but can only modify blobs on
the staging blob store.
+ 
+ {{{
+ +-----------------+        +-----------------+
+ | Production Env  |        |   Staging Env   |
+ | +-------------+ |        | +-------------+ |
+ | |     Oak     | |    +-----+     Oak     | |
+ | +------+------+ |    |   | +------+------+ |
+ |        |        |  Read- |        |        |
+ |        |        |  Only  |        |        |
+ | +------V------+ |    |   | +------V------+ |
+ | | S3DataStore <------+   | | S3DataStore | |
+ | +-------------+ |        | +-------------+ |
+ |                 |        |                 |
+ +-----------------+        +-----------------+
+ }}}
+ 
+ === S3DataStore Clustering ===
+ The composite blob store could be used to address [[JCR Binary Usecase]] UC9, where two
Oak nodes in a cluster may both have a record of a blob in the node store but one node may
temporarily not be able to access the blob in the case of async upload.  This could be addressed
by using a composite blob store where the first level blob store would be !FileDataStore on
an NFS mount and the second level blob store would be !S3DataStore without a cache.  The composite
blob store on each node will look for any asset in both the !FileDataStore and the !S3DataStore,
thus avoiding a split-brain scenario.
+ 
+ {{{
+ +-----------------------------+
+ | Node 1                      |
+ | +-----+                     |
+ | |     |                     |
+ | |     +-------------------------------+
+ | | Oak |                     |         |
+ | |     |   +---------------+ |         |
+ | |     +-->+ FileDataStore | |  +------V------+
+ | |     |   +-------^-------+ |  | S3DataStore |
+ | +-----+           |         |  +------+------+
+ +-------------------|---------+         |
+                     |            +------V------+
+                    NFS           |  S3 Bucket  | 
+                     |            +------^------+
+ +-------------------|---------+         |
+ | Node 2            |         |  +------+------+
+ | +-----+           |         |  | S3DataStore |
+ | |     |   +-------V-------+ |  +------^------+
+ | |     +---> FileDataStore | |         |
+ | | Oak |   +---------------+ |         |
+ | |     |                     |         |
+ | |     +-------------------------------+
+ | |     |                     |
+ | +-----+                     |
+ +-----------------------------+
+ }}}
+ 

Mime
View raw message