jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "DataStore" by ThomasMueller
Date Mon, 19 May 2008 13:10:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by ThomasMueller:
http://wiki.apache.org/jackrabbit/DataStore

The comment on the change is:
faster garbage collection 

------------------------------------------------------------------------------
   * Currently the File``Data``Store creates a lot of directories (and files). If possible
the number of directories (and maybe files) should be reduced to improve performance. 
   * Fulltext search and meta data extraction could be done when storing the object (only
once per object) and stored next to the object. 
   * Client should first send the checksum and size of large objects when they store something
(import, adding or updating data), in many cases the actual data does not need to be sent.
+  * Speed up garbage collection. One idea is to use 'back references' for larger objects:
each larger object would know the set of nodes that reference it. This would be an 'append
only' set, that means at runtime links only added, not removed. Only the garbage collection
process removes links. The garbage collection would first update links for large objects (this
process could stop at the first link that still exists). That way large objects can be removed
quickly if they are not used any more. Afterwards, objects with a low use count should be
scanned. This algorithm wouldn't necessarily speed up the total garbage collection time, but
it would free up space more quickly.
  

Mime
View raw message