jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Edelson <justinedel...@gmail.com>
Subject Re: Creating New DataStore (extending)
Date Thu, 18 Mar 2010 18:41:07 GMT
Tom,
I would recommend looking at the existing DataStore implementations,
specifically FileDataStore.

On 3/18/10 2:17 PM, mrbahr wrote:
> 
> Hi,
> 
> I need to create a new type of data store that interfaces with a
> scaleable/reliable file store.  I'm looking at extending the DataStore
> interface, but really can't figure out how it is used.  Can anyone please
> point me in the right direction so that I can write a new plugin?
> 
> A few questions off the bat.....
> 
> In the addRecord method, a stream is provided.  The documentation indicates
> that if an identical stream exists, then that stream is returned.  How do
> folks implement that?  I certainly would not want to compare this stream to
> all existing one's, especially given there could be millions.  Any thoughts
> on this type of implementation?
FileDataStore reads the stream and calculates an SHA-1 digest which is
then used as a unique identifier. This identifier is then hashed to
create a path:

        String string = identifier.toString();
        file = new File(file, string.substring(0, 2));
        file = new File(file, string.substring(2, 4));
        file = new File(file, string.substring(4, 6));
        return new File(file, string);

> 
> I am guessing that the difference between getRecordIfStored and getRecord is
> that getRecord throws an exception if the record does not exist.  Correct?
> 
Correct.

> How does garbage collection affect (or work with) the DataStore.  What are
> the suggestions for allowing entries to be deleted?  If the client uses
> WebDAV, there is a delete method that deletes the entry.  I assume that it
> marks the item as unused and then garbage collection cleans it later.  How
> is the item removed from the data repository?

If you look at org.apache.jackrabbit.core.data.GarbageCollector...

mark() calls updateModifiedDateOnAccess(startTime) and then accesses
every node, resulting it in the updating of all the last modified dates
for active data. Then...

sweep() calls deleteAllOlderThan(startTime) which tells the DataStore
implementation to delete all the data older than the beginning of the
mark() call.

see:
http://wiki.apache.org/jackrabbit/DataStore#Running_Data_Store_Garbage_Collection_.28Jackrabbit_2.x.29

Hope this helps. What file store are you targetting?

Justin

> 
> Thanks for your help.
> 
> Tom


Mime
View raw message