jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esteban Franqueiro" <efran...@bea.com>
Subject RE: [jira] Resolved: (JCR-926) Global data store for binaries
Date Wed, 17 Oct 2007 18:59:23 GMT
Welcome back.

> I think there is still a way to get the digest. If you wrap 
> the InputStream like this:
> public DataRecord addRecord(InputStream input) throws IOException {
>   MessageDigest digest = MessageDigest.getInstance(DIGEST);
>   InputStream input = new DigestInputStream(input, digest);
>   ...
> }

> (...)
> I hope you get the idea. The second UPDATE statement will 
> return 0 update count if a record with the same digest 
> exists. I didn't set the LENGTH everywhere.

Yes, this could be a way, but I'm not yet convinced with the approach
It's a long procedure. So there are many possible points of failure.
Still, we do have a working impl of this idea. I'll upload it later.

> > [time]     [user session]                       [GC session]
> > t0         node.setProperty(binary)
> > t1                                        gc.start
> > t2                                        gc.stop
> > t3         node.save
> Is this the problem? I didn't think about that so far... In 
> my view it is rare because the garbage collection usually 
> will take some time, and the time between node.setProperty 
> and node.save is (hopefully) short.

Well, it may or may not be short, depending on the application.

> But it needs to be 
> solved. I will write a test case.

There are some test cases in the attachment I uploaded to JCR-1154 (see
for example SimpleGCTest.testSaveRevert()). Of course they are synthetic
and use a synchronous GC, but I think they prove the point.

> A simple solution is to 
> only delete records when the repository is stopped (or 
> started). Obviously this is not a solution for long running 
> repositories. Another idea is to keep transient large 

I briefly mentioned this in a previous email, and the uploaded code has
a working version of a GC that collects on repository startup. To try
it, you have to set the garbageCollectorMode to "startup" in

> binaries in a WeakReferenceHashMap, and before deleting check 
> that the record is not in there.
> > > > We needed to make RepositoryImpl.getWorkspaceNames() public
> > it would be easier to just make them public.
> > Or at least export some of those methods thru an utils class.
> I will make it public.


> > Do you mind
> > if I send you a zip file with the implementations of the 
> interfaces, 
> > the tests, the configurations and parsers, and the 
> initialization routines?
> There is no hurry, but please don't send it via email. The 
> preferred way is to attach the code to the bug:
> http://issues.apache.org/jira/browse/JCR-1154 (you will be 
> asked to 'Grant license to ASF for inclusion in ASF works').

It's done.
Once again I apologize for having to send the code as I did instead of
in a patch.

On a different topic, I want to discuss again another issue. When you
are marking (ie, updating nodes times) on a workspace, and then you
delete records older than the current cut-point, you risk deleting
records belonging to other workspaces. The solution I propose is simply
to mark the nodes of all the workspaces, and then do a single delete
operation for the entire data store (I don't know if this was always
your idea, but when we begun working on this, we assumed otherwise :( )


Esteban Franqueiro

Notice:  This email message, together with any attachments, may contain information  of  BEA
Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,
 copyrighted  and/or legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient, and have received
this message in error, please immediately return this by email and then delete it.

View raw message