jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "DataStore" by ThomasMueller
Date Thu, 13 Sep 2007 15:12:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by ThomasMueller:
http://wiki.apache.org/jackrabbit/DataStore

New page:
== How to configure the file data store ==

To use the File┬┤Data┬┤Store, add this to your repository.xml after the <Repository>
start tag:

    <DataStore class="org.apache.jackrabbit.core.data.File``Data``Store"/> 

== Additional configuration options ==

This is a full configuration using the default values:

    <DataStore class="org.apache.jackrabbit.core.data.File``Data``Store">
        <param name="path" value="${rep.home}/repository/datastore"/>
        <param name="minRecordLength" value="100"/>
    </Data``Store>

== Clustering ==

Clustering is supported if you use a clustered file system. You need to set data store path
of all cluster nodes to the same location.

== How does it work ==

When adding a binary object, Jackrabbit checks the size of it. When it is larger than minRecordLength,
it is added to the data store, otherwise it is kept in-memory. This is done very early (possible
when calling Property.setValue(stream)). Only the unique data identifier is stored in the
persistence manager (except for in-memory objects, where the data is stored). When updating
a value, the old value is kept there an the new value is added (there is no update operation).

The current implementation still stores temporary files in some situations, for example in
the RMI client. Those cases will be changed to use the data store directly where it makes
sense.

Very small objects (where it does not make sense to create a file) are kept in memory.

Objects in the data store are only removed when they are not reachable. There is no 'update'
operation, only 'add new entry'. Data is added before the transaction is committed. Additions
are globally atomic, cluster nodes can share the same data store. Even different repositories
can share the same store, as long as garbage collection is done correctly. 

== Running data store garbage collection ==

Running the garbage collection is currently a manual process.

== How to write a new data store implementation ==

New implementations are welcome! Cool would be a S3 data store (http://en.wikipedia.org/wiki/Amazon_S3).
Maybe somebody needs a database data store. A caching data store would be great as well (items
that are used a lot are stored in fast file system, others in a slower one).

Mime
View raw message