jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Composite Blob Store" by MattRyan
Date Fri, 12 Jan 2018 23:05:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Composite Blob Store" page has been changed by MattRyan:
https://wiki.apache.org/jackrabbit/Composite%20Blob%20Store?action=diff&rev1=12&rev2=13

  = Composite Blob Store =
- 
- NOTE:  (Thomas Mueller): Motivation is missing (what problems do we want to solve).
- NOTE:  (Matt Ryan): Added motivation section below, is this sufficient or is more needed
here?
- 
  
  NOTE:  The current status of this component is a '''proposed feature'''.
  
@@ -22, +18 @@

   * Choose different types of storage for different types of binary data in Oak based on
some configurable criteria.
  
  == Technical Details ==
- Configuration specifies which delegate blob stores comprise the composite blob store, in
some order of preference.  The exact order is defined using a strategy pattern implementation
(see "Delegate Traversal" below) which can be customized by end users.  In the case of conflicting
delegates where there are multiple matches, the first matching delegate will generally be
the one used, although the traversal strategy implementation can provide different conflict
resolution behavior.
+ To use the composite blob store, delegate blob stores are configured as data store factories.
 The configuration of each data store factory must specify the following:
+ * Any standard configuration required to configure this data store
+ * A role (any string value) identifying this delegate
+ * Any configuration pertaining to this data store's role as a delegate
+ For example, if configuring an S3DataStore as a delegate, a user would:
+ * Configure standard S3DataStore values, like the access key, secret key, and bucket name
+ * Define a role for this data store, e.g. "role=S3DS_1"
+ * Add other configuration, if any, to configure this data store as a delegate (like whether
it is a readOnly store)
  
- NOTE:  (Thomas Mueller): What storage filter to use if the binary was created without node
(which is actually the normal case). What if the storage filter changes? Store it in multiple
locations? That would require more space which needs to be avoided (to save cost).
- NOTE:  (Matt Ryan): I moved this concept to a rejected status for 1.8 and possible future
for > 1.8, let me know if this addresses your concerns for now.
- 
+ After configuring the delegates, the composite is configured using the PID "org.apache.jackrabbit.oak.plugins.blob.datastore.CompositeDataStore".
 A single configuration entry is required, which is a listing of the roles this composite
manages.
+ For example, suppose there are two delegate data stores.  In the configuration of one delegate,
it defines "role=S3DS_1".  In the configuration of the second delegate, it defines "role=S3DS_2".
 To use these two delegates, the composite data store configuration would include the line
"roles=S3DS_1,S3DS_2".
+   
  === Delegate Traversal ===
- "Delegate Traversal" refers to the logic that is used to go through delegates on a read
or write request to determine which delegate should be used for a request.  The algorithm
used to traverse delegates should be extensible with a reasonable default provided.  This
is effectively an implementation of the Strategy pattern.  Any user of the system could provide
their own implementation to deliver whatever custom logic they wish to provide.
- 
- The proposed default implementation is called the Intelligent Delegate Traversal Strategy.
 It is "intelligent" because it attempts to interpret the provided configuration and apply
a priority based on the interpretation of the configuration.  Another possible default could
be e.g. the Simple Delegate Traversal Strategy.  This strategy simply attempts to use the
delegates in the raw order they are specified in the configuration, with no logic applied.
 The Intelligent Delegate Traversal Strategy will probably provide the most unsurprising results
for end users.
+ "Delegate Traversal" refers to the logic that is used to go through delegates on a read
or write request to determine which delegate should be used for a request.  The algorithm
used to traverse delegates is extensible.  The default implementation is called the Intelligent
Delegate Traversal Strategy.  It is "intelligent" because it attempts to interpret the provided
configuration and apply a priority based on the interpretation of the configuration.  Another
possible default could be e.g. the Simple Delegate Traversal Strategy.  This strategy simply
attempts to use the delegates in the raw order they are specified in the configuration, with
no logic applied.  The Intelligent Delegate Traversal Strategy will probably provide the most
unsurprising results for end users.
  
  ==== Intelligent Delegate Traversal Strategy ====
  
@@ -39, +40 @@

  A delegate may be specified as a read-only delegate, in which case it will not accept any
write requests.  If it would otherwise have been chosen for a write request, the request will
defer to the next delegate in the traversal that matches the request, if any.
  
  ====== Delegate Write Preference ======
- The write algorithm is fairly simple:  Excluding delegates that are read-only, iterate through
delegates to select the first that can accept the write, and perform the write.  Return the
result of this write as the result of the composite blob store write.
+ The write algorithm is fairly simple:  Excluding delegates that are read-only, iterate through
delegates to select the first that can accept the write, and perform the write.  Return the
result of this write as the result of the composite blob store write.  Priority is given to
delegates that already have a matching blob ID.
- 
- Note that any other versions of the blob in other delegate blob stores may continue to exist,
but would not be used as the most recently written one would be returned for most read requests.
 If other versions end up abandoned they should eventually be garbage collected by the standard
data store garbage collection task.
  
  ===== Reads =====
  The composite blob store fulfills read requests by deferring the read to delegate blob stores.
@@ -52, +51 @@

   * If no delegate can satisfy the read request, iterate through read-only delegates and
select the first that can satisfy the read request.
   * Return the result of the delegate read, or an appropriate "not found" error message if
no delegate has a match.
  
- NOTE:  (Thomas Mueller): Bloom filters should be mentioned (are they used, if yes how, if
not why not). What about storing the (original) location in the binary reference (as a hint
to speed up reading).
- 
  The response to a read request is the result of the first successful read from a delegate.
 In this way the top priority result is always selected.
  
  Read-only delegates take lower precedence to writable delegates, as writable delegates may
contain more up-to-date information which would be preferred.
  
- ==== Read-Only Delegates ====
+ === Read-Only Delegates ===
  The composite blob store supports the notion of a read-only delegate blob store.  One or
more of the delegate blob stores can be configured in read-only mode, meaning that it can
be used to satisfy read requests but not write requests.  An example use case for this scenario
is where two content repositories are used, one for a production environment and one for a
staging environment.  The staging repository can be configured with a composite blob store
that accesses to the production storage location in read-only mode, so tests can execute in
staging using production data without modifying production data or the production store.
  
  Reads issued to a read-only delegate would be processed as normal.  Read-only delegates
are not considered for write requests, causing the composite blob store to move on to the
next delegate to attempt to fulfill the write request.
  
- Note that configuring all delegates of a composite blob store would make the blob store
useless for storing blobs and thus should not be an allowed condition - at least one delegate
blob store must not be a read-only delegate.
+ Note that configuring all delegates of a composite blob store as read-only delegates would
make the blob store useless for storing blobs and thus should not be an allowed condition
- at least one delegate blob store must not be a read-only delegate.
  
+ === Blob ID / Delegate Mapping ===
+ In order to avoid issuing read requests to delegates that do not contain the blob ID in
question, the composite blob store must maintain a mapping of each blob ID to the delegate
containing it.  Bloom filters should be used for this purpose.  This mapping must be created
at startup and maintained as the system runs, and should be rebuilt every time data store
garbage collection runs (among other things, the filter may need to be resized for the current
number of blob IDs present).
- 
- === Curation ===
- 
- Curation is the process of evaluating the blobs in a blob store to determine if that blob
store is still the correct location for blobs to reside. In the case of the composite blob
store, a reason to curate data may be to gradually move data for which the determination of
"correct" location can change over time.
- 
- Since writes and reads use the same order of delegate preference, reads will come from the
most recently written location, so if a blob can be found it will always be the most up-to-date
version of it.  Any other blobs, if not referenced anywhere else in the system, will eventually
be garbage collected.
- 
- It would be possible to add custom curators, but for now those are rejected features (see
below).
- 
- === Rejected Features ===
-  * For the initial implementation of composite blob store, the concept of [[Composite Blob
Store Storage Filters|storage filters]] was considered but rejected for Oak 1.8.
-  * There are currently no cold storage blob stores, so the concept of [[Composite Blob Store
Cold Storage Delegates|cold storage delegates]] was also rejected for Oak 1.8.
-  * Background curation is rejected for Oak 1.8.
  
  == Use Cases ==
+ There are many possible use cases for the composite blob store.  In order to manage the
implementation of the capability, functionality is being added and supported one use case
at a time.  When this capability is released, the first supported use case will be the Staging
Environment (listed below).  Other use cases are listed here for reference but are not currently
supported.
+ 
  === Staging Environment ===
  The composite blob store can be used to address a production/staging deployment use case,
where one Oak repository is the production repository and another is the staging repository.
 The production repository accesses a single blob store.  The staging repository uses a composite
blob store to access a staging blob store as well as the production blob store in read-only
mode.  Thus staging can serve blobs out of either blob store but can only modify blobs on
the staging blob store.
  

Mime
View raw message