jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "DataStore" by ThomasMueller
Date Fri, 02 Oct 2009 07:14:14 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "DataStore" page has been changed by ThomasMueller:
http://wiki.apache.org/jackrabbit/DataStore?action=diff&rev1=46&rev2=47

  
  Migration: currently there is no special mechanism to migrate data from a blob store to
a data store. The only known way to convert is to export the data, and re-import into a new
repository.
  
- == How does it work ==
+ == How Does It Work ==
  
  When adding a binary object, Jackrabbit checks the size of it. When it is larger than minRecordLength,
it is added to the data store, otherwise it is kept in-memory. This is done very early (possible
when calling Property.setValue(stream)). Only the unique data identifier is stored in the
persistence manager (except for in-memory objects, where the data is stored). When updating
a value, the old value is kept there (potentially becoming garbage) an the new value is added.
There is no update operation.
  
@@ -128, +128 @@

  
  Objects are usually stored early in the data store, even before the transaction is committed.
Only the the identifier is stored in the persistence manager. The blob store is not used any
longer (except for backward compatibility). When using the RMI client, large objects are not
stored directly in the data store, instead they are first transferred to the server.
  
- == Running data store garbage collection ==
+ == Running Data Store Garbage Collection (Jackrabbit 1.x) ==
  
  Running the garbage collection is currently a manual process. You can run this as a separate
thread concurrently to your application:
  
@@ -169, +169 @@

   1. Manually delete files with last modified date older than X
  
  
+ == Running Data Store Garbage Collection (Jackrabbit 2.x) ==
+ 
+ Running the garbage collection is currently a manual process. You can run this as a separate
thread concurrently to your application:
+ 
+ {{{
+ JackrabbitRepositoryFactory rf = new RepositoryFactoryImpl();
+ JackrabbitRepository rep = (JackrabbitRepository) rf.getRepository(null);
+ RepositoryManager rm = rf.getRepositoryManager(rep);
+ 
+ // need to login to start the repository
+ Session session = rep.login(new SimpleCredentials("", "".toCharArray()));
+ 
+ DataStoreGarbageCollector gc = rm.createDataStoreGarbageCollector();
+ try {
+     gc.mark();
+     gc.sweep();
+ } finally {
+     gc.close();
+ }
+ 
+ session.logout();
+ rm.stop();
+ }}}
+ 
+ The process above applies to a standalone repository. When clustered, the garbage collection
can be run from any cluster node.
+ 
+ If multiple distinct repositories use the same data store, the process is a bit different:
First, call gc.mark() on the first repository, then on the second and so on. At the end, call
gc.sweep() on the first repository:
+ 
+ {{{
+ gc1.mark();
+ gc2.mark();
+ gc3.mark();
+ gc1.sweep();
+ gc1.close();
+ gc2.close();
+ gc3.close();
+ }}}
+ 
+ An alternative is:
+ 
+  1. Write down the current time = X
+  1. Run gc.mark() on each repository 
+  1. Manually delete files with last modified date older than X
+ 
  == How to write a new data store implementation ==
  
  New implementations are welcome! Cool would be a S3 data store (http://en.wikipedia.org/wiki/Amazon_S3).
A caching data store would be great as well (items that are used a lot are stored in fast
file system, others in a slower one).

Mime
View raw message