jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esteban Franqueiro" <efran...@bea.com>
Subject RE: [jira] Resolved: (JCR-926) Global data store for binaries
Date Tue, 25 Sep 2007 20:50:23 GMT
Hi Thomas.

> A database-backed data store would be great!

Yes, indeed :)

> What about RepositoryException?

Yes, that would work too. But we wanted to be able to indentify the
specific exception thrown from the DS. In a few places we wrapped DSE
inside a RE.

> What about updating the time when getLength() is called?

Sorry, I don't understand this.

> There is already DataStore.updateModifiedDateOnRead(long 
> before); a separate method is not required in my view.

This didn't work in our testing.

> > The issue of the delay when calling getStream()/getRecord() 
> means that 
> > the information provided by the record has to be stored in 
> the record, 
> > instead of relaying in the backing store (like it's done in the 
> > FileDataRecord class).
> 
> Sorry I don't understand this part.

The FileDataRecord always queries the file object for it's properties. A
DatabaseRecord should store all the info (except the stream itself)
instead of going to the DB on each getter call.

> > a stream that closes its DB resources when it's closed is needed.
> 
> That should be simple to implement; I suggest to do this as 
> part of the database data store (I can help if required).

We have an implementation of this already, but is a little messy. I'll
be cleaning it up shortly.

> > getRecord() should not retrieve the stream and keep resources open 
> > unless explictly asked
> 
> Sure. This is done already for the FileDataRecord. Is there a 
> problem to delay opening the stream for the database data store?

No, I'm just mentioning the issues we run into.

> > We have code to run the GC only once on repository startup, 
> and in a 
> > background thread
> 
> Both should work. I prefer the background thread

Me too, but we needed an interim implementation while we continued to
develop and test the other method.

> > run-once option needs global lock during it's run.
> > no session can be started while a collection is in progress.
> > The background thread instead needs some way to keep track 
> of changes 
> > in the binary properties.
> 
> I don't think this is required. Let's say a large object is 
> deleted while the garbage collection runs. In this case it 
> will not be collected, which is OK in my view (it will be 
> collected in the next GC run). If a new object is inserted, 
> it will not be collected because the last modified date is newer.

Well, we found that it is necesary, since the GC runs on a different
session than the users (we're using system sessions for this). Also note
that we're currently using the simple GC that's in the svn.
So, the user adds a binary property in one session, after the file is
uploaded but before the user save()s the session, the GC on the system
session starts reading all nodes from the workspace. Since the changes
are not yet written to persistent storage, the file is assumed to be a
deleted property, and is in fact deleted.
Maybe running the GC in the system session is the wrong approach, but we
haven't dug deeper into it yet.
We needed to make RepositoryImpl.getWorkspaceNames() public for this to
work.

> My plan is to scan in the persistence manager, using the new 
> method getAllNodeIds(), if a bundle persistence manager is 
> used. This should speed up the GC scan.

Will this change have any effect on the issue I just mentioned?

> > We can contribute our code,
> 
> That would be great of course!

Ok then, our current implementation is against a patched-up 1.3.1. What
do you think is the best way to isolate the code?

> > but it's gonna take same time to extract it our code it's more of a 
> > POC than production-ready for now.
> 
> No problem. We can work together on fixing the problems.

Sure, that would be great!

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA
Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,
 copyrighted  and/or legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient, and have received
this message in error, please immediately return this by email and then delete it.

Mime
View raw message