Dear Wiki user, You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification. The following page has been changed by ThomasMueller: http://wiki.apache.org/jackrabbit/DataStore The comment on the change is: faster garbage collection ------------------------------------------------------------------------------ * Currently the File``Data``Store creates a lot of directories (and files). If possible the number of directories (and maybe files) should be reduced to improve performance. * Fulltext search and meta data extraction could be done when storing the object (only once per object) and stored next to the object. * Client should first send the checksum and size of large objects when they store something (import, adding or updating data), in many cases the actual data does not need to be sent. + * Speed up garbage collection. One idea is to use 'back references' for larger objects: each larger object would know the set of nodes that reference it. This would be an 'append only' set, that means at runtime links only added, not removed. Only the garbage collection process removes links. The garbage collection would first update links for large objects (this process could stop at the first link that still exists). That way large objects can be removed quickly if they are not used any more. Afterwards, objects with a low use count should be scanned. This algorithm wouldn't necessarily speed up the total garbage collection time, but it would free up space more quickly.