Author: amitj Date: Wed Sep 2 04:52:05 2015 New Revision: 1700705 URL: http://svn.apache.org/r1700705 Log: OAK-936: Site checkin for project Oak Documentation-1.4-SNAPSHOT Modified: jackrabbit/site/live/oak/docs/osgi_config.html jackrabbit/site/live/oak/docs/plugins/blobstore.html Modified: jackrabbit/site/live/oak/docs/osgi_config.html URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/osgi_config.html?rev=1700705&r1=1700704&r2=1700705&view=diff ============================================================================== --- jackrabbit/site/live/oak/docs/osgi_config.html (original) +++ jackrabbit/site/live/oak/docs/osgi_config.html Wed Sep 2 04:52:05 2015 @@ -1,739 +1,752 @@ - - - - - - - - - Jackrabbit Oak - Repository OSGi Configuration - - - - - - - - - - - - - - - - - - Fork me on GitHub - - - - - - - - - -
- - - - - -
-
- -
- - -
- -

Repository OSGi Configuration

-

Oak comes with a simple mechanism for constructing content repositories for use in embedded deployments and test cases. Details regarding that are provided as part of Repository Construction. When used in OSGi environment then various Oak components can be configured using OSGi Configuration Support.

-

Depending on component the configuration can be modified at runtime or needs to be specified before the initial system setup.

- -
-
Static Configuration
-
Such configuration settings cannot be changed once a repository is initialized. For example choosing a DataStore or specifying the path of User Home. Such properties should not be changed once a system is initialized.
-
Dynamic Configuration
-
Some of the configuration settings like thread pool size, cache size etc can be changed at runtime or after initial system setup
-
-

Each OSGi configuration is referred via a PID i.e. persistent identifier. Sections below provide details around various PID used in Oak

-
-
-

NodeStore

-
-

SegmentNodeStore

-

PID org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService

- -
-
repository.home
-
Path to repository home under which various repository related data is stored. Segment files would be stored under ${repository.home}/segmentstore directory
-
tarmk.size
-
Default - 256 (in MB)
-
Maximum file size (in MB)
-
-

-
-

DocumentNodeStore

-

PID org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService

- -
-
mongouri
-
Default - mongodb://localhost:27017
-
Specifies the MongoURI required to connect to Mongo Database
-
db
-
Default - oak
-
Name of the database in Mongo
-
cache
-
Default - 256
-
Cache size in MB. This is distributed among various caches used in DocumentNodeStore
-
changesSize
-
Default - 256
-
Size in MB of capped collection used in Mongo for caching the diff output.
-
customBlobStore
-
Default false
-
Boolean value indicating that custom BlobStore to use. By default it uses MongoBlobStore.
-
maxReplicationLagInSecs
-
Default 21600 (6 hours)
-
Determines the duration beyond which it can be safely assumed that state on secondary would be consistent with primary and its safe to read from them. (See OAK-1645)
-
blobGcMaxAgeInSecs
-
Default 86400 (24 hrs)
-
Blob Garbage Collector (GC) logic would only consider those blobs for GC which are not accessed recently (currentTime - lastModifiedTime > blobGcMaxAgeInSecs). For example as per default only those blobs which have been created 24 hrs ago would be considered for GC
-
versionGcMaxAgeInSecs
-
Default 86400 (24 hrs)
-
Oak uses MVCC model to store the data. So each update to a node results in new version getting created. This duration controls how much old revision data should be kept. For example if a node is deleted at time T1 then its content would only be marked deleted at revision for T1 but its content would not be removed. Only when a Revision GC is run then its content would removed and that too only after (currentTime -T1 > versionGcMaxAgeInSecs)
-
blobCacheSize
-
Default 16 (MB)
-
DocumentNodeStore when running with Mongo would use MongoBlobStore by default unless a custom BlobStore is configured. In such scenario the size of in memory cache for the frequently used blobs can be configured via blobCacheSize.
-
persistentCache
-
Default "" (an empty string, meaning disabled)
-
The persistent cache, which is stored in the local file system.
-
-
nodeCachePercentage
-
Default 25
-
Percentage of cache allocated for nodeCache. See Caching
-
childrenCachePercentage
-
Default 10
-
Percentage of cache allocated for childrenCache. See Caching
-
diffCachePercentage
-
Default 5
-
Percentage of cache allocated for diffCache. See Caching
-
docChildrenCachePercentage
-
Default 3
-
Percentage of cache allocated for docChildrenCache. See Caching
-
cacheSegmentCount
-
Default 16
-
The number of segments in the LIRS cache
-
Since 1.0.15, 1.2.3, 1.3.0
-
cacheStackMoveDistance
-
Default 16
-
The delay to move entries to the head of the queue in the LIRS cache
-
Since 1.0.15, 1.2.3, 1.3.0
-
-

Example config file

- -
-
mongouri=mongodb://localhost:27017
-db=oak
-
-
-
Mongo Configuration
-

All the configuration related to Mongo can be specified via Mongo URI

- -
    - -
  • -

    Authentication - Username and password should be specified as part of uri e.g. the following connects and logs in to the admin database as user sysop with the password moon:

    - -
    -
    mongodb://sysop:moon@localhost
    -
  • - -
  • -

    Read Preferences and Write Concern - These also can be spcified as part of Mongo URI. Refer to Read Preference and Write Concern section for more details. For e.g. following would set readPreference to secondary and prefer replica with tag dc:ny,rack:1. It would also specify the write timeout to 10 sec

    - -
    -
    mongodb://db1.example.net,db2.example.com?readPreference=secondary&readPreferenceTags=dc:ny,rack:1&readPreferenceTags=dc:ny&readPreferenceTags=&w=1&wtimeoutMS=10000    
    -
  • -
-

One can also specify the connection pool size, socket timeout etc. For complete details about various possible option refer to Mongo URI

-

-
-

Configuring DataStore/BlobStore

-

BlobStores are used to store the binary content. Support for Jackrabbit 2 DataStore is also provided via a DataStoreBlobStore wrapper. To use a specific BlobStore implementation following two steps need to be performed

- -
    - -
  1. Configure NodeStore - NodeStore config need to be modified to enable use of custom BlobStore via setting customBlobStore to true
  2. - -
  3. Configure BlobStore - Create config for the required BlobStore by using the PID for that BlobStore.
  4. -
-

Refer to Config steps in Apache Sling for an example on how to configure a FileDataStore with DocumentNodeStore

-
-

Jackrabbit 2 - FileDataStore

-

Jackrabbit 2 FileDataStore can be configured via following pid

-

PID org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore

- -
-
path
-
Default - Not specified
-
Path to the directory under which the files would be stored.
-
minRecordLength
-
Default - 100
-
Size in bytes. Binary content less than minRecordLength would be inlined i.e. the data store id is the data itself).
-
maxCachedBinarySize
-
Default - 17408 (17 KB)
-
Size in bytes. Binaries with size less than or equal to this size would be stored in in memory cache
-
cacheSizeInMB
-
Default - 16
-
Size in MB. In memory cache for storing small files whose size is less than maxCachedBinarySize. This helps in better performance when lots of small binaries are accessed frequently.
-
-
-

Jackrabbit 2 - S3DataStore

-

PID org.apache.jackrabbit.oak.plugins.blob.datastore.S3DataStore

- -
-
maxCachedBinarySize
-
Default - 17408 (17 KB)
-
Size in bytes. Binaries with size less than or equal to this size would be stored in in memory cache
-
cacheSizeInMB
-
Default - 16
-
Size in MB. In memory cache for storing small files whose size is less than maxCachedBinarySize. This helps in better performance when lots of small binaries are accessed frequently.
-
-
-

System properties and Framework properties

-

Following properties are supported by Oak. They are grouped in two parts Stable and Experimental. The stable properties would be supported in future version but the experimental properties would might not be supported in future versions

-
-

Stable

- -
-
oak.mongo.uri
-
Type - System property and Framework Property
-
Specifies the MongoURI required to connect to Mongo Database
-
oak.mongo.db
-
Type - System property and Framework Property
-
Name of the database in Mongo
-
-
-

Experimental

-
-

Configuration Steps for Apache Sling

-

The OSGi Configuration Admin service defines a mechanism for passing configuration settings to an OSGi bundle. How a configuration is registered with the OSGi system varies depending on the application.

-

For example to configure DocumentNodeStore to use FileDataStore in Apache Sling

- -
    - -
  1. -

    Create a config file with name org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg under ${sling.home}/install folder with content

    - -
    -
    #Mongo server details
    -mongouri=mongodb://localhost:27017
    -
    -#Name of Mongo database to use
    -db=aem-author
    -
    -#Store binaries in custom BlobStore e.g. FileDataStore
    -customBlobStore=true
    -
  2. - -
  3. -

    Create a config file with name org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg under ${sling.home}/install folder with content

    - -
    -
    #The minimum size of an object that should be stored in this data store.
    -minRecordLength=4096
    -
    -#path to the DataStore
    -path=./sling/repository/datastore
    -
    -#cache for storing small binaries in-memory
    -cacheSizeInMB=128
    -
  4. -
-
-

Framework Properties vs OSGi Configuration

-

OSGi components can read config data from two sources.

- -
    - -
  1. ConfigurationAdmin - These are configured via placing the *.cfg files under ${sling.home}/install folder. These can also be modified at runtime via Felix WebConsole typically available at http://localhost:8080/system/console
  2. - -
  3. Framework Properties - An OSGi framework can be configured to start with some framework properties. These properties cannot be changed at runtime. In Apache Sling these can be specified in ${sling.home}/sling.properties or ${sling.home}/conf/sling.properties
  4. -
-

In Oak some of the config properties are also read from framework properties. If a value is specified in both config file and framework properties then framework property takes precedence.

-

For example by default Sling sets repository.home to ${sling.home}/repository. So this value need not be specified in config files

-
-
-
- -
- - - + + + + + + + + + Jackrabbit Oak - Repository OSGi Configuration + + + + + + + + + + + + + + + + + + Fork me on GitHub + + + + + + + + + +
+ + + + + +
+
+ +
+ + +
+ +

Repository OSGi Configuration

+

Oak comes with a simple mechanism for constructing content repositories for use in embedded deployments and test cases. Details regarding that are provided as part of Repository Construction. When used in OSGi environment then various Oak components can be configured using OSGi Configuration Support.

+

Depending on component the configuration can be modified at runtime or needs to be specified before the initial system setup.

+ +
+
Static Configuration
+
Such configuration settings cannot be changed once a repository is initialized. For example choosing a DataStore or specifying the path of User Home. Such properties should not be changed once a system is initialized.
+
Dynamic Configuration
+
Some of the configuration settings like thread pool size, cache size etc can be changed at runtime or after initial system setup
+
+

Each OSGi configuration is referred via a PID i.e. persistent identifier. Sections below provide details around various PID used in Oak

+
+
+

NodeStore

+
+

SegmentNodeStore

+

PID org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService

+ +
+
repository.home
+
Path to repository home under which various repository related data is stored. Segment files would be stored under ${repository.home}/segmentstore directory
+
tarmk.size
+
Default - 256 (in MB)
+
Maximum file size (in MB)
+
+

+
+

DocumentNodeStore

+

PID org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService

+ +
+
mongouri
+
Default - mongodb://localhost:27017
+
Specifies the MongoURI required to connect to Mongo Database
+
db
+
Default - oak
+
Name of the database in Mongo
+
cache
+
Default - 256
+
Cache size in MB. This is distributed among various caches used in DocumentNodeStore
+
changesSize
+
Default - 256
+
Size in MB of capped collection used in Mongo for caching the diff output.
+
customBlobStore
+
Default false
+
Boolean value indicating that custom BlobStore to use. By default it uses MongoBlobStore.
+
maxReplicationLagInSecs
+
Default 21600 (6 hours)
+
Determines the duration beyond which it can be safely assumed that state on secondary would be consistent with primary and its safe to read from them. (See OAK-1645)
+
blobGcMaxAgeInSecs
+
Default 86400 (24 hrs)
+
Blob Garbage Collector (GC) logic would only consider those blobs for GC which are not accessed recently (currentTime - lastModifiedTime > blobGcMaxAgeInSecs). For example as per default only those blobs which have been created 24 hrs ago would be considered for GC
+
versionGcMaxAgeInSecs
+
Default 86400 (24 hrs)
+
Oak uses MVCC model to store the data. So each update to a node results in new version getting created. This duration controls how much old revision data should be kept. For example if a node is deleted at time T1 then its content would only be marked deleted at revision for T1 but its content would not be removed. Only when a Revision GC is run then its content would removed and that too only after (currentTime -T1 > versionGcMaxAgeInSecs)
+
blobCacheSize
+
Default 16 (MB)
+
DocumentNodeStore when running with Mongo would use MongoBlobStore by default unless a custom BlobStore is configured. In such scenario the size of in memory cache for the frequently used blobs can be configured via blobCacheSize.
+
persistentCache
+
Default "" (an empty string, meaning disabled)
+
The persistent cache, which is stored in the local file system.
+
+
nodeCachePercentage
+
Default 25
+
Percentage of cache allocated for nodeCache. See Caching
+
childrenCachePercentage
+
Default 10
+
Percentage of cache allocated for childrenCache. See Caching
+
diffCachePercentage
+
Default 5
+
Percentage of cache allocated for diffCache. See Caching
+
docChildrenCachePercentage
+
Default 3
+
Percentage of cache allocated for docChildrenCache. See Caching
+
cacheSegmentCount
+
Default 16
+
The number of segments in the LIRS cache
+
Since 1.0.15, 1.2.3, 1.3.0
+
cacheStackMoveDistance
+
Default 16
+
The delay to move entries to the head of the queue in the LIRS cache
+
Since 1.0.15, 1.2.3, 1.3.0
+
+

Example config file

+ +
+
mongouri=mongodb://localhost:27017
+db=oak
+
+
+
Mongo Configuration
+

All the configuration related to Mongo can be specified via Mongo URI

+ +
    + +
  • +

    Authentication - Username and password should be specified as part of uri e.g. the following connects and logs in to the admin database as user sysop with the password moon:

    + +
    +
    mongodb://sysop:moon@localhost
    +
  • + +
  • +

    Read Preferences and Write Concern - These also can be spcified as part of Mongo URI. Refer to Read Preference and Write Concern section for more details. For e.g. following would set readPreference to secondary and prefer replica with tag dc:ny,rack:1. It would also specify the write timeout to 10 sec

    + +
    +
    mongodb://db1.example.net,db2.example.com?readPreference=secondary&readPreferenceTags=dc:ny,rack:1&readPreferenceTags=dc:ny&readPreferenceTags=&w=1&wtimeoutMS=10000    
    +
  • +
+

One can also specify the connection pool size, socket timeout etc. For complete details about various possible option refer to Mongo URI

+

+
+

Configuring DataStore/BlobStore

+

BlobStores are used to store the binary content. Support for Jackrabbit 2 DataStore is also provided via a DataStoreBlobStore wrapper. To use a specific BlobStore implementation following two steps need to be performed

+ +
    + +
  1. Configure NodeStore - NodeStore config need to be modified to enable use of custom BlobStore via setting customBlobStore to true
  2. + +
  3. Configure BlobStore - Create config for the required BlobStore by using the PID for that BlobStore.
  4. +
+

Refer to Config steps in Apache Sling for an example on how to configure a FileDataStore with DocumentNodeStore

+
+

Jackrabbit 2 - FileDataStore

+

Jackrabbit 2 FileDataStore can be configured via following pid

+

PID org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore

+ +
+
path
+
Default - Not specified
+
Path to the directory under which the files would be stored.
+
minRecordLength
+
Default - 100
+
Size in bytes. Binary content less than minRecordLength would be inlined i.e. the data store id is the data itself).
+
maxCachedBinarySize
+
Default - 17408 (17 KB)
+
Size in bytes. Binaries with size less than or equal to this size would be stored in in memory cache
+
cacheSizeInMB
+
Default - 16
+
Size in MB. In memory cache for storing small files whose size is less than maxCachedBinarySize. This helps in better performance when lots of small binaries are accessed frequently.
+
+
+

Jackrabbit 2 - S3DataStore

+

PID org.apache.jackrabbit.oak.plugins.blob.datastore.S3DataStore

+ +
+
maxCachedBinarySize
+
Default - 17408 (17 KB)
+
Size in bytes. Binaries with size less than or equal to this size would be stored in in memory cache
+
cacheSizeInMB
+
Default - 16
+
Size in MB. In memory cache for storing small files whose size is less than maxCachedBinarySize. This helps in better performance when lots of small binaries are accessed frequently.
+
+
+

Oak - SharedS3DataStore (Since Oak 1.2.0)

+

Supports shared S3 DataStore

+

PID org.apache.jackrabbit.oak.plugins.blob.datastore.SharedS3DataStore

+ +
+
maxCachedBinarySize
+
Default - 17408 (17 KB)
+
Size in bytes. Binaries with size less than or equal to this size would be stored in in memory cache
+
cacheSizeInMB
+
Default - 16
+
Size in MB. In memory cache for storing small files whose size is less than maxCachedBinarySize. This helps in better performance when lots of small binaries are accessed frequently.
+
+
+

System properties and Framework properties

+

Following properties are supported by Oak. They are grouped in two parts Stable and Experimental. The stable properties would be supported in future version but the experimental properties would might not be supported in future versions

+
+

Stable

+ +
+
oak.mongo.uri
+
Type - System property and Framework Property
+
Specifies the MongoURI required to connect to Mongo Database
+
oak.mongo.db
+
Type - System property and Framework Property
+
Name of the database in Mongo
+
+
+

Experimental

+
+

Configuration Steps for Apache Sling

+

The OSGi Configuration Admin service defines a mechanism for passing configuration settings to an OSGi bundle. How a configuration is registered with the OSGi system varies depending on the application.

+

For example to configure DocumentNodeStore to use FileDataStore in Apache Sling

+ +
    + +
  1. +

    Create a config file with name org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg under ${sling.home}/install folder with content

    + +
    +
    #Mongo server details
    +mongouri=mongodb://localhost:27017
    +
    +#Name of Mongo database to use
    +db=aem-author
    +
    +#Store binaries in custom BlobStore e.g. FileDataStore
    +customBlobStore=true
    +
  2. + +
  3. +

    Create a config file with name org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg under ${sling.home}/install folder with content

    + +
    +
    #The minimum size of an object that should be stored in this data store.
    +minRecordLength=4096
    +
    +#path to the DataStore
    +path=./sling/repository/datastore
    +
    +#cache for storing small binaries in-memory
    +cacheSizeInMB=128
    +
  4. +
+
+

Framework Properties vs OSGi Configuration

+

OSGi components can read config data from two sources.

+ +
    + +
  1. ConfigurationAdmin - These are configured via placing the *.cfg files under ${sling.home}/install folder. These can also be modified at runtime via Felix WebConsole typically available at http://localhost:8080/system/console
  2. + +
  3. Framework Properties - An OSGi framework can be configured to start with some framework properties. These properties cannot be changed at runtime. In Apache Sling these can be specified in ${sling.home}/sling.properties or ${sling.home}/conf/sling.properties
  4. +
+

In Oak some of the config properties are also read from framework properties. If a value is specified in both config file and framework properties then framework property takes precedence.

+

For example by default Sling sets repository.home to ${sling.home}/repository. So this value need not be specified in config files

+
+
+
+ +
+ + + \ No newline at end of file Modified: jackrabbit/site/live/oak/docs/plugins/blobstore.html URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/plugins/blobstore.html?rev=1700705&r1=1700704&r2=1700705&view=diff ============================================================================== --- jackrabbit/site/live/oak/docs/plugins/blobstore.html (original) +++ jackrabbit/site/live/oak/docs/plugins/blobstore.html Wed Sep 2 04:52:05 2015 @@ -1,13 +1,13 @@ - + Jackrabbit Oak - The Blob Store @@ -210,7 +210,7 @@ +
+

Blob Garbage Collection

+

Blob Garbage Collection(GC) is applicable for the following blob stores:

+ + +

Oak implements a Mark and Sweep based Garbage Collection logic.

+ +
    + +
  1. Mark Phase - In this phase the binary references are marked in both BlobStore and NodeStore + +
      + +
    1. Mark BlobStore - GC logic would make a record of all the blobs present in the BlobStore.
    2. + +
    3. Mark NodeStore - GC logic would make a record of all the blob references which are referred by any node present in NodeStore. Note that any blob references from old revisions of node would also be considered as a valid references.
    4. +
  2. + +
  3. Sweep Phase - In this phase all blob references form Mark BlobStore phase which were not found in Mark NodeStore part would considered as GC candidates. It would only delete blobs which are older than a specified time interval (last modified say 24 hrs (default) ago).
  4. +
+

The garbage collection can be triggered by calling:

+ + +
+

Shared DataStore Blob Garbage Collection (Since 1.2.0)

+

On start of a repository configured with a shared DataStore, a unique repository id is registered. In the DataStore this repository id is registered as an empty file with the format repository-[repository-id] (e.g. repository-988373a0-3efb-451e-ab4c-f7e794189273). The high-level process for garbage collection is still the same as described above. But to support blob garbage collection in a shared DataStore the Mark and Sweep phase can be run independently.

+

The details of the process are as follows:

+ + +

The shared DataStore garbage collection is applicable for the following DataStore(s):

+ +